Article Text

Download PDFPDF

A case report of evaluating a large-scale health systems improvement project in an uncontrolled setting: a quality improvement initiative in KwaZulu-Natal, South Africa
  1. Kedar S Mate1,2,
  2. Wilbroda Hlolisile Ngidi3,
  3. Jennifer Reddy3,
  4. Wendy Mphatswe3,
  5. Nigel Rollins3,4,
  6. Pierre Barker1,5
  1. 1Institute for Healthcare Improvement, Cambridge, Massachusetts USA
  2. 2Department of Medicine, Weill Cornell Medical Center, New York, New York, USA
  3. 320,000+ Partnership, Department of Pediatrics, University of KwaZulu-Natal, Durban, South Africa
  4. 4Department of Maternal, Newborn, Child and Adolescent Health and Development, World Health Organization, Geneva, Switzerland
  5. 5Department of Pediatrics, School of Medicine, University of North Carolina, Chapel Hill, North Carolina, USA
  1. Correspondence to Dr Kedar S MateDepartment of Medicine, Weill Cornell Medical College, 525 E. 68th Street, New York, NY 10065, USA; kmate{at}


Objective New approaches are needed to evaluate quality improvement (QI) within large-scale public health efforts. This case report details challenges to large-scale QI evaluation, and proposes solutions relying on adaptive study design.

Study design We used two sequential evaluative methods to study a QI effort to improve delivery of HIV preventive care in public health facilities in three districts in KwaZulu-Natal, South Africa, over a 3-year period. We initially used a cluster randomised controlled trial (RCT) design.

Principal findings During the RCT study period, tensions arose between intervention implementation and evaluation design due to loss of integrity of the randomisation unit over time, pressure to implement changes across the randomisation unit boundaries, and use of administrative rather than functional structures for the randomisation. In response to this loss of design integrity, we switched to a more flexible intervention design and a mixed-methods quasiexperimental evaluation relying on both a qualitative analysis and an interrupted time series quantitative analysis.

Conclusions Cluster RCT designs may not be optimal for evaluating complex interventions to improve implementation in uncontrolled ‘real world’ settings. More flexible, context-sensitive evaluation designs offer a better balance of the need to adjust the intervention during the evaluation to meet implementation challenges while providing the data required to evaluate effectiveness. Our case study involved HIV care in a resource-limited setting, but these issues likely apply to complex improvement interventions in other settings.

  • Continuous quality improvement
  • Health services research
  • Randomised controlled trial
  • Evaluation methodology

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


High rates of preventable HIV infections illustrate the gap between results obtained in controlled research settings and those recorded in ‘real life’ health systems with restricted resources. Preventive measures can now reduce the risk of perinatally acquired HIV from approximately 25% to less than 2%, and global calls for eliminating perinatal HIV transmission have been issued by WHO and the United Nations.1–4 Yet, despite the promise of these interventions, only 50% of pregnant women are tested for HIV in the highest-burden countries in east and southern Africa, and only 68% of pregnant women living with HIV receive therapy containing antiretroviral medications in these countries.5 Failure to deliver well-established prevention of mother-to-child HIV transmission (PMTCT) interventions contributes to the nearly 370 000 babies infected with HIV each year.5 ,6

Evidence-based interventions, such as antiretroviral therapy for PMTCT, are typically taken from the research setting to large-scale implementation using conventional approaches that rely on detailed central planning, large-scale training programmes, and rigid, top-down, scale-up designs. These approaches have not uniformly succeeded, and long delays persist in implementation of evidence-based clinical care.7 Newer implementation approaches have turned to the fields of improvement and implementation science, as well as the hybrid disciplines of behavioural economics, psychology and social science. The study of these newer implementation approaches has, however, relied on traditional research and evaluation designs used for proof of concept, safety and efficacy studies in highly controlled environments.

Here we describe a case report of our experiences in studying health systems improvement approaches to implementing and scaling up a public health programme, in this case focused on PMTCT, across three populous health districts in KwaZulu-Natal (KZN) Province, South Africa. For reasons described below, we changed the evaluation approach from a conventional, cluster randomised controlled design to a quasiexperimental, mixed-methods design. Our aim is to illustrate, via this case example, the reasons why such designs may be better suited to the evaluation of quality improvement (QI) implementation approaches in their early stages. Although research methodologists frequently refer to the mismatch between evaluation and intervention design, case reports describing this are rarely found in the literature.

Overall methods

The study, named ‘20 000+’ (the number of perinatal HIV infections that could be averted in KZN Province each year with an effective PMTCT programme), was a partnership between the KZN Provincial Department of Health, the University of KZN, and the Institute for Healthcare Improvement. Over a 3-year period, the partnership used a QI implementation approach to improve the delivery of the existing evidence-based PMTCT healthcare interventions.


Across all three districts in the study's catchment area—Ugu, Umgungundglovu and Ethekwini—rates of antenatal HIV prevalence exceeded 40%.8 An earlier survey in KZN had shown the local PMTCT transmission rate to be 20.6%, compared with the internationally reported optimal (controlled study condition) rates of less than 2%.9–11 In response, the Department of Health sought a health systems intervention to reduce mother-to-child HIV transmission rates to below 5% by the year 2011.12

Multiple obstacles have contributed to KZN's difficulties in implementing PMTCT guidelines.13 While in some cases there were genuine resource constraints, the South African health system is better resourced with available drugs and trained personnel than many other health systems in sub-Saharan Africa.14 ,15 Review of PMTCT care processes suggested that the programme failed to reliably deliver the sequence of processes of care known as the PMTCT ‘cascade.’16 ,17 In one report, while a large number of pregnant women were HIV tested during pregnancy (88%), fewer received key laboratory tests (40%), and vital prophylactic treatments for women (57%) and their babies (15%).18

Quality improvement implementation approach

Seeking to more rapidly implement evidence-based antiretroviral interventions to prevent mother-to-child HIV transmission, the 20 000+ Partnership developed an implementation strategy based on QI methods used elsewhere in South Africa.19–21 This QI implementation approach was operationalised at two levels: district and facility. At the district level, ‘task teams’ consisting of district management staff, information officers and medical officers met with the 20 000+ staff on a monthly basis to review PMTCT process data and systems failures. These ‘task teams’ aimed at institutionalising the regular review of process data with the goal of sustaining the PMTCT improvements. From the outset, responsibility for all antiretroviral interventions to prevent mother-to-child HIV transmission in the three districts was assumed by the Department of Health.

At the facility level, the QI implementation approach involved facility-based improvement teams, including some combination of the clinic-based PMTCT professional nurses, HIV testing counsellors, data clerks and the facility operational manager. QI mentors from 20 000+ visited facilities every other week to train and mentor these teams to use data-driven QI methods and management skills. Improvement teams were coached to use local performance data to identify gaps in clinical care (run-charts) and then use rapid-cycle QI methods, namely plan-do-study-act cycles, the Model for Improvement, process mapping and run-chart analysis to test approaches to close these gaps.22 ,23 Improvement teams were encouraged to test their own ideas to improve reliability of care for each step in the PMTCT cascade. Detailed components of the facility-level QI implementation approach are detailed in Table 1.22 ,24 ,25

Table 1

Components of the quality improvement framework


Ethics approval for the research study was sought and obtained from both the University of KZN Biomedical Research ethics committee and from the Centres for Disease Control institutional review board (IRB). A data safety monitoring board was established to comply with IRB recommendations.

Initial evaluation design

A randomised controlled trial (RCT) of QI interventions

The study named 20 000+ was initially designed as a stepped-wedge, cluster RCT, applied uniformly across the three districts, to quantitatively assess the effectiveness of QI methods applied at different levels of intensity. In the three districts, 222 health facilities providing PMTCT services were divided into clusters of facilities that were supervised by a single primary health clinic supervisor—a Department of Health employee who had line responsibility for supervising the quality of care at those facilities. There were 48 of these randomisation units at the start of the study. As all 222 facilities could not be engaged simultaneously, the start of the intervention was staggered over three successive start dates to initiate primary health clinic supervisors and their clinics into the study (figure 1).

Figure 1

Initial evaluation design.

The cluster randomised study was designed to answer the question of whether rapid-cycle QI methods could improve performance of PMTCT processes at a district level. The study was not designed or equipped to measure mother-to-child HIV transmission rates in the three districts on a continuous basis using conventional population sampling methods. Accordingly, the primary endpoints for the study were the process indicators that were tracked monthly with feedback provided to all participating clinics and to all district and provincial managers.

Challenges encountered

Efforts to implement the initial RCT evaluation design were complicated by a series of issues that motivated a change in both the intervention and evaluation design.

Motivation of study participants

The randomisation design disrupted our ability to engage willing participants at the start of the study. Typically, when piloting a new QI effort, willing participants interested in leading the effort, and enthusiastic about implementing it, would be selected for initial activities. Only if successful results were obtained from these early adopters would these results be leveraged to generate interest and enthusiasm for adoption of the intervention across a broader range of participants. The randomisation design targeted a random group of primary health clinic supervisors and their clinics, a number of whom were uninterested in the study and its methods. By contrast, many other district health staff demonstrated interest but could not be included, given the randomisation sequence. In one district, the primary health clinic supervisor role was disbanded, which disrupted the randomisation unit and abruptly shut down the study design across that entire district.

Limiting effects of the QI implementation approach to individual facilities when changes actually occurred at a district level

The intention of the RCT design was to limit bias and contamination, yet, the 20 000+ staff frequently observed natural diffusion of ideas that originated within the facility-based QI teams that could not be limited to intervention sites, and at times conflicted with district priorities. For instance, in one group of ‘intervention’ clinics, changes to laboratory processing steps and transport systems had cut CD4 test result turnaround times from 7 weeks to less than 1 week. This change, however powerful and motivating, could not be intentionally spread by the 20 000+ staff to other area clinics, as the randomisation prevented this natural spread. Yet, since the transport system and the central laboratory were universal to the district, the change was automatically spread to the entire district despite the fact that more than half the facilities were not formally involved in the study at that time.

Using administrative rather than functional systems structures for the randomisation

The RCT design relied on one particular structure within the district health system, the primary health clinic supervisor and their associated clinics. While active in some instances, the primary health clinic supervisor-clinic relationship at times conflicted with more robust and better defined referral relationships between clinics and hospitals. Selection of an inappropriate and non-functional unit of randomisation confounded the implementation of the study design. Similarly, the study design prevented the 20 000+ staff from taking advantage of pre-existing, regular network meetings within the districts (such as monthly perinatal meetings) where clinics from both intervention and control arms gathered regularly.

Motivation of 20 000+ staff

As a result of being forced to spend time working with uninterested, unwilling intervention clinics, the 20 000+ quality mentors reported substantial time and energy wastage and occasional abuse and mistreatment. The 20 000+ staff noted that the study design prevented development of sustainable learning opportunities, and wasted the will and energy of enthusiastic health facilities. Worse yet, the 20 000+ staff felt that the existing design was eroding their own confidence in the methodology and their morale.

Evidence base for the interventions

While the healthcare interventions that were part of this study (PMTCT medication regimens and treatment protocols) have a strong and clear evidence base, the details of the QI implementation approach were much less clear, and their application in this particular context lacked a strong evidence base. This was perhaps the most important problem with the design of the RCT, as it introduced a great deal of site-to-site variation as the specific implementation approach underwent local adaptation.

Revised design

The problems executing the RCT evaluation design challenged the effective implementation of the intervention itself (ie, studying the intervention was interfering with delivering the intervention in the first place). Given these challenges, the study's Data Safety & Monitoring Board recommended abandoning the randomised nature of site selection and revised both the intervention and evaluation designs.

Non-randomised intervention

The intervention was redesigned to allow the 20 000+ staff to identify and work with any member of the district health team who demonstrated capacity and interest to lead PMTCT improvement activities. Natural referral systems between hospitals and their feeder clinics were examined, and pre-existing meetings (eg, monthly perinatal meetings) were used to ‘spread’ health systems improvement knowledge and expertise (figure 2). These referral units also followed ‘sub-district’ administrative boundaries within each of the three districts. The redesigned intervention allowed greater flexibility in using different approaches to target different cadres of health workers according to their interests, knowledge and specific clinical environments. The redesigned intervention characteristics, summarised in table 2, included simplification, standardisation, stronger mentorship, flexible leadership and use of natural systems linkages. Accordingly, a tailored QI implementation approach was taken in each district, summarised in table 3.

Figure 2

Subdistrict referral unit. A subdistrict always contains at least one secondary hospital and a group of primary health clinics. The secondary hospital may have a referral relationship with a tertiary care centre.

Table 2

Characteristics of the redesigned intervention

Table 3

Customised quality improvement (QI) approach to each district

Revised evaluation design

A mixed-methods quasiexperimental design was used to continue the evaluation of the 20 000+ Partnership after the abandoned randomisation. The quasiexperimental approach allowed intervention protocols to be adjusted at predetermined times during the course of the study (after the inclusion of the first subdistrict, and with each added subdistrict), allowing the trial to be adjusted as contextual conditions change.26 ,27 This permitted study staff to work with willing and engaged health units in more functionally defined health system structures, allowed customisation of the implementation to local context, and ameliorated the study staff burnout described above.  

The revised experimental design allowed improved study execution. This was principally observed in facility-based QI visits being carried out with more regularity and with more participation from local staff. In a 6-month period prior to the change in design, facility visits were carried out by study team members 42% of the time (n=222 of 528 expected visits). In a similar 6-month period after the design changes, 76% of the visits were conducted (n=470 of 602 expected visits, p=0.002). Both quantitative and qualitative data collection efforts, in part depended on these visits, which were strengthened as a result.

Statistical process control and interrupted time series methods, using both logistic regression and statistical process control charts, provided the analytic tools to demonstrate quantitative improvements in the revised evaluation design.28 Qualitative data collection allowed better understanding of the variations seen in observed results so that we could capture the role of context and relative strength of intervention implementation.29 These qualitative data were analysed and brought together to inform the development of ‘driver diagrams,’ which are visual representations of the study's content theory. Such driver diagrams were then compared across intervention locations to arrive at a better sense of not only whether the QI intervention worked in a particular location, but also why.30

Lessons learned

Our experiences in KZN of applying a cluster RCT design to evaluate the effectiveness of a QI approach to improve PMTCT demonstrate the challenges of applying conventional study designs to test complex large-scale programme implementation strategies in an environment that is subject to constant change. Research designs that do not recognise or cannot respond to variations in local context will be limited in their ability to capture the effects of health interventions on population health outcomes.31 Using conventional research approaches, we tried to establish a controlled environment for evaluating a health systems intervention across a large and varied geopolitical area—a process that ultimately conflicted with the contextual issues that the intervention sought to address.

As others have noted, probability assessments, including RCTs, that rely on randomisation and extrapolation of conclusions from one setting to others are not always practical, and may be unethical for evaluating large-scale interventions embedded in social systems.20 ,31–33 The RCT is an unsurpassed tool for assessing whether a clinical treatment has caused a biological effect; in such cases, the causal pathway is short and clearly understood.33 However, where the intervention's success depends on a long chain of intermediate steps,31 and where implementation is heavily affected by social and cultural factors,20 even the most internally valid RCT is rarely generalisable to other settings. Two factors in our revised design allowed better accounting of these intermediate steps and contextual factors: the use of qualitative methods and driver diagrams allowed better accounting of contextual factors, and interrupted time-series analysis allowed almost real-time quantitative understanding of the relative impact of these factors on the intervention.

In addition, in porous community-based settings that cannot be assiduously controlled, a routinely designed RCT may not be a suitable vehicle for evaluation.34 In our case, we learned that we could not, nor was it desirable to, limit the natural spread of knowledge about how to reliably implement evidence-based practices from intervention to control facilities.

Quasiexperimental designs, including pragmatic trial designs, that operate under ‘real life’ conditions and can add and subtract factors considered to affect outcomes are likely to be more successful in determining a cause-and-effect association between interventions and outcomes at a population level.31 In addition, analytic tools, such as continuous time series and statistical process control are likely to provide the type of constant feedback needed for implementation activities to adjust and improve with prevailing contextual circumstances.28 ,35–37 In our modified design, each district took on a different approach to making systems changes depending on the state of readiness for change and functionality of the districts concerned. While the three designs were tailored to the geographic, administrative and competency needs of the individual districts, all had a common set of ingredients derived from the field of QI for effecting improvement of the PMTCT programme.38

There are limits to quasiexperimental designs: the lack of a randomised intervention allows selection bias or participant motivation bias. In fact, our revised design sought motivation bias among healthcare personnel and district managers. The presence of this important bias, and the lack of tightly controlled concurrent comparison districts, make probability-based conclusions impossible. At best, our study, and similarly designed evaluations, will be able to ‘plausibly’ associate the improvements in process and outcome results with the interventions they seek to evaluate. We believe that if the logic model underlying the intervention is strong enough, a plausibility-based association is likely sufficient evidence to substantiate an association between the interventions and effects seen. This will not convince all audiences, but we believe that policy makers and many practitioners will be sufficiently convinced by a plausibility-based argument founded on a convincing logic model in favour of a particular intervention.

Our experience using an RCT-based evaluation design to study a QI intervention should not be misinterpreted to suggest that RCTs are entirely inappropriate in the field of QI. Indeed, there are several examples of well-designed randomised and controlled evaluations of well-defined, clearly delineated QI interventions, including some that include mixed-methods approaches.39 ,40 However, when attempting to evaluate QI as an approach to implementing a healthcare intervention in real time and in real-life circumstances, our results imply that formative studies using time series analysis may be more suitable to build an evidence base around a particular combination of QI tools. In settings where the implementation itself is the subject of the investigation (in so-called ‘implementation research’ studies), we believe that the experiences described above will help inform the selection of appropriate evaluation methods. We also believe that these lessons, while sourced from an HIV prevention programme taking place in a middle-income country, are broadly relevant across geographies and healthcare improvement efforts.

In summary, this case report presents our experience of using two different evaluation designs to understand the impact of a QI implementation approach to improve delivery of a large-scale public health intervention in a complex, heterogeneous, geopolitical environment. We have found that evaluation methods seeking to prove benefit of flexible health systems interventions, including QI interventions, must themselves be adaptive. Further research is needed to validate these adaptive context-sensitive methods for evaluation in ‘real life’ large-scale health systems. While our study was carried out in a resource-limited setting, we believe that challenges we faced in applying an RCT to establish the impact of a QI implementation approach would be experienced in any setting.


The authors would like to thank Jane Roessner, Frank Davidoff and Zoe Sifrim for their invaluable advice, review and contributions to this manuscript.



  • Contributions KSM participated in the study design and analysis, and drafted the initial manuscript. WHN carried out the study. JR carried out the study. WM participated in the study design, execution and analysis. NR conceived the study and participated in the study design and analysis. PB conceived the study, participated in the study design, analysis and drafting of the manuscript. All authors read and approved the final manuscript.

  • Funding Centers for Disease Control BF 061/08. Financial support and receipt of reports.

  • Competing interests None.

  • Ethics approval University of KZN Biomedical Research & Ethics Committee, South Africa; Centres for Disease Control IRB, US.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Linked Articles