Article Text

Procedural instruction in invasive bedside procedures: a systematic review and meta-analysis of effective teaching approaches
  1. Grace C Huang1,2,
  2. Jakob I McSparron2,3,
  3. Ethan M Balk4,
  4. Jeremy B Richards5,
  5. C Christopher Smith1,
  6. Julia S Whelan6,
  7. Lori R Newman2,
  8. Gerald W Smetana1
  1. 1Division of General Medicine and Primary Care, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA
  2. 2Carl J. Shapiro Institute for Education and Research at Harvard Medical School and Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA
  3. 3Division of Pulmonary and Critical Care, Department of Medicine, Beth Israel Deaconess Medical, Center
  4. 4Center for Clinical Evidence Synthesis, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, Massachusetts, USA
  5. 5Division of Pulmonary and Critical Care, Medical University of South Carolina, Charleston, South Carolina, USA
  6. 6Countway Library of Medicine, Harvard Medical School, Boston, Massachusetts, USA
  1. Correspondence to Dr Grace C Huang, Department of Medicine, Beth Israel Deaconess Medical Center, 330 Brookline Avenue, E/ES-212, Boston, MA 02215, USA; ghuang{at}bidmc.harvard.edu

Abstract

Importance Optimal approaches to teaching bedside procedures are unknown.

Objective To identify effective instructional approaches in procedural training.

Data sources We searched PubMed, EMBASE, Web of Science and Cochrane Library through December 2014.

Study selection We included research articles that addressed procedural training among physicians or physician trainees for 12 bedside procedures. Two independent reviewers screened 9312 citations and identified 344 articles for full-text review.

Data extraction and synthesis Two independent reviewers extracted data from full-text articles.

Main outcomes and measures We included measurements as classified by translational science outcomes T1 (testing settings), T2 (patient care practices) and T3 (patient/public health outcomes). Due to incomplete reporting, we post hoc classified study outcomes as ‘negative’ or ‘positive’ based on statistical significance. We performed meta-analyses of outcomes on the subset of studies sharing similar outcomes.

Results We found 161 eligible studies (44 randomised controlled trials (RCTs), 34 non-RCTs and 83 uncontrolled trials). Simulation was the most frequently published educational mode (78%). Our post hoc classification showed that studies involving simulation, competency-based approaches and RCTs had higher frequencies of T2/T3 outcomes. Meta-analyses showed that simulation (risk ratio (RR) 1.54 vs 0.55 for studies with vs without simulation, p=0.013) and competency-based approaches (RR 3.17 vs 0.89, p<0.001) were effective forms of training.

Conclusions and relevance This systematic review of bedside procedural skills demonstrates that the current literature is heterogeneous and of varying quality and rigour. Evidence is strongest for the use of simulation and competency-based paradigms in teaching procedures, and these approaches should be the mainstay of programmes that train physicians to perform procedures. Further research should clarify differences among instructional methods (eg, forms of hands-on training) rather than among educational modes (eg, lecture vs simulation).

  • Evidence-based medicine
  • Simulation
  • Patient safety

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Introduction

Inpatient bedside procedures are frequent causes of morbidity and mortality. Nearly half of adverse events in the Harvard Medical Practice Study were associated with procedures, and an estimated 13% stemmed directly from technical errors.1 Initiatives to prevent procedural complications through the use of care bundles and checklists have considerably reduced infection rates, costs and mortality.2–4 These interventions aim to improve team-based processes and introduce evidence-based practices at a systems level.

Nevertheless, a major determinant of inpatient procedural outcomes is attributable to individual skill. Leape et al1 demonstrated that the largest category of provider-related error was due to technical factors. A large and growing body of quality improvement literature provides strong evidence of the benefit of prospective procedural instruction over a traditional (and haphazard) apprenticeship model.5 Simulation training has played a central role in this regard;6 ,7 however, educational approaches still vary widely.8 ,9 Previous systematic reviews have not focused on the non-surgical invasive bedside procedures encountered by, and expected of, the majority of physicians during the course of their training. Clinical leaders and faculty remain uncertain which approaches are most effective in improving procedural skills and reducing patient morbidity.

We conducted a systematic review to determine effective instructional strategies used in bedside procedural training to identify approaches. We hypothesised that studies involving simulation would be most strongly linked to clinical benefits, that is, improved procedural performance and reduced complication rates, and that the findings from studies with more rigorous research methodology would produce larger benefits.

Methods

Study selection

Our primary question was, ‘For physician trainees, which teaching approaches to bedside procedural instruction are effective in improving clinical outcomes?’ We limited the scope of our study to procedures that (1) the majority of physicians encounter during ‘general medical training’, including medical school and residencies predominantly involving direct patient care; (2) occur primarily at the bedside (excluding operations); (3) are invasive (defined as more than minimal risk, with morbidity outside of the target organ and/or that typically require verbal or written consent); and (4) tend to be performed by physicians (rather than allied health professionals). Upon review of core competencies as defined by Residency Review Committees, academic societies and the certification boards of paediatrics, internal medicine, family medicine and emergency medicine,10–15 we included 12 procedures: arterial blood gas sampling, arthrocentesis, endotracheal intubation, lumbar puncture (LP), paracentesis, thoracentesis, central venous catheterisation (CVC), interosseous line placement (IO), umbilical vessel catheterisation, suprapubic bladder catheterisation, nasogastric tube placement (NGT) and bag-mask ventilation (BMV).

Primary studies were eligible if they evaluated procedural training for physicians in training or in practice for any of the 12 included procedures. We excluded studies that (1) were not relevant to instruction or training, (2) were not relevant to the technical aspect of procedures, (3) did not include physicians or physician trainees, (4) originated from a developing country (due to different training practices), (5) described a survey study without a procedural training intervention, (6) did not describe the results of original research (eg, reviews, editorials) or (7) had as their focus the technical refinement of equipment used to perform or teach procedures. Our outcomes of interest were procedural skills (either in laboratory or clinical settings) and clinical outcomes. We made no restriction on study design or by language.

Data sources and searches

We complied explicitly with Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines for all aspects of this systematic review. An experienced medical librarian conducted the literature search with input from the research team. We searched the following databases from inception through December 2014: PubMed/MEDLINE (National Center for Biotechnology 1966–2014), EMBASE (Elsevier 1974–2014), Web of Science (Thomson 1900–2014) and the Cochrane Library (Wiley 1994–2014). Our strategy combined a large set of terms describing procedures, such as ‘ambu bag’, ‘paracentesis’, ‘suprapubic’, ‘arterial blood gas’ with a smaller set of educational terms, for example, ‘education, educate, teach, instruct, instruction’. We used both MeSH and free text terms and limited results to the date range 1 January 1966 (or database inception) to 12 December 2014. Online supplementary appendix table 1 details the search strategy for PubMed; we constructed comparable search strings for the other databases.

Data extraction and quality assessment

All titles and abstracts were independently reviewed by the primary author (GH) and one additional investigator (JM, JR, LN or CS). We reconciled disagreements for inclusion by discussion and reached subsequent consensus for determining included studies.

In a collaborative, iterative process, we developed an extraction instrument to collect data in the following areas: study design, learner characteristics, educational approach, instructor characteristics, rater characteristics, assessment approach (medium of education, method, timing), subject population, outcomes and study quality. We specified educational approach in three ways:16 (1) mode, referring to the overall medium of education (eg, computer, in-person, simulator); (2) educational supplements, referring to any instructional materials that complement the training but do not represent the primary form of instruction (eg, supplemental texts, videos, lectures); and (3) instructional features, referring to specific teaching principles (eg, incorporating feedback). For study quality and the risk of bias within individual studies, we adapted the Cochrane risk of bias tool for randomised trials17 and the Newcastle-Ottawa instrument for cohort and case–control trials,18 to accommodate study characteristics relevant to this body of literature. We also used the Medical Education Research Study Quality Instrument (MERSQI) tool, which has been used previously to classify the quality of educational and quality improvement research.19 We piloted the data collection instrument among reviewers and then trained reviewers by independently extracting the same articles and reaching consensus on coding. Subsequently, eligible studies were independently extracted by the primary author (GCH) and one additional investigator. We reconciled discordant reviews by discussion and subsequent consensus.

Data synthesis and analysis (qualitative)

We compiled the characteristics of all eligible studies. We relied upon the translational science model as applied to educational research, namely T1 outcomes referring to performance in testing settings, T2 outcomes referring to patient-level metrics reflecting skill in the clinical setting (eg, time to intubation) and T3 outcomes referring to clinical events (eg, bleeding, infection, airway loss).20 Because reporting of SDs and other measures of variance was commonly incomplete, for many outcomes we were unable to conduct meta-analyses; therefore, we made a post hoc decision to classify study outcomes as ‘negative’ or ‘positive’ based on whether there was statistical significance reported for each of the types of outcomes (eg, T1). Lastly, to support the notion that bedside procedures could be considered as a cohesive whole, we conducted a comparison to determine whether outcomes differed across types of procedures.

Data synthesis and analysis (quantitative)

We conducted meta-analyses on the subset of studies reporting outcomes, categorised as T1, T2 or T3 outcomes. The primary criterion for studies qualifying for meta-analysis was controlled trials (with either randomised or non-randomised control groups) that reported all relevant numeric outcomes sufficient to calculate risk ratios (RR) for categorical variables and measures of variance for continuous outcomes. We conducted random effects model meta-analyses of outcomes for which there were at least three such studies with similar metrics (eg, success rates, pass rates, checklist scores), comparing several variables—the educational mode and the instructional features. We used RR for categorical variables and the standardised mean difference (the difference between means divided by the pooled SD, SMD) for continuous variables in reporting effect sizes. We performed univariable random effects model meta-regression to examine the contribution of individual study features (simulation, lecture, hands-on, video, competency-based, repetition, feedback, study design) on the given outcome. We used Stata V.12 (College Park, Texas, USA) for all statistical analyses.

Results

After screening 9312 articles, 344 citations met preliminary eligibility criteria. After full-text review, 161 studies were eligible for analysis. The study flow is summarised in figure 1.

Summary characteristics

Characteristics of all included articles are summarised in online supplementary table S1.21–181 The most common single procedure represented in the studies was intubation (52 studies), followed by CVC (35 studies), arthrocentesis (10 studies), LP (15 studies), IO (5 studies), BMV (4 studies), NGT (6 studies), thoracentesis (3 studies), paracentesis (1 study) and suprapubic bladder catheterisation (1 study). Thirty studies (19%) included more than one procedure. No study examined umbilical vessel catheterisation alone.

The study designs comprised 44 randomised controlled trials (RCTs), 34 non-randomised controlled trials (NRCTs), 81 single-group studies without controls (eg, pre–post designs) and two case–control studies (ie, an assessment of learner outcomes with retrospective determination of whether they had undergone procedural training). Among studies, 103 involved resident trainees and 44 involved medical students. Specialties, where specified, included 38 studies with learners in internal medicine, 27 in anaesthesia, 24 in emergency medicine, 15 in surgery, 24 in paediatrics and 5 in family medicine.

The majority of studies were multifactorial in their educational approaches. Among educational modes, simulation was by far the most frequent (78%), followed by computer-based instruction (7%). Among educational supplements, lectures were most frequently used (53%), followed by videos (40%), advanced image-based guidance (eg, ultrasound, fiberoptic visualisation) in 32% of studies and paper-based materials in 18% of studies. Instructional features included a hands-on approach in 85% of studies, repetitive practice in 67%, live demonstration in 43% and feedback in 45%. Competency-based instruction (training in which learners must reach predefined thresholds before completing training, such as mastery learning148) accounted for 24% of studies. Immersive simulation, in which part-task trainers were used in a simulated clinical scenario, represented 9% of studies.

Procedural trainers were largely professionals in active practice, with only six studies relying on residents as instructors. Trainers’ specialty backgrounds resembled that of learners in range and frequency but were not reported in 51 out of 161 studies.

With respect to the mode of assessing skill after training, simulation was most common (48%) followed by live patients (32%). For assessment methods, performance checklists were common (50%) as was self-reported procedural comfort (38%). Knowledge tests were less frequently used (24%). Among the subset of 68 controlled studies (RCTs and NRCTs), the types of outcomes measured differed; among 27 studies reporting categorical outcomes, success rate was most common, most often referring to overall rate of completion of the procedure, particularly with intubation and CVC studies (see online supplementary appendix table 2).

Complication rates were rarely reported (eight studies) and choice of outcome varied, even within the same type of procedure (eg, measuring both oesophageal perforation and mucosal injury for intubation). Of the 37 controlled studies with continuous outcomes (see online supplementary appendix table 3), checklist scores (23 studies) were most frequent, followed by time to completion of procedure (10 studies), knowledge test scores (9 studies) and procedural comfort (3 studies). Among studies, 101 (63%) entailed immediate postintervention assessment and 106 assessed skill retention, with the timing ranging from 2 weeks to 12 months.

Qualitative analysis of controlled studies

Dichotomisation of controlled studies (RCTs or NRCTs) as either positive (statistically significant results) or negative (non-significant results) led to 27 positive findings in 41 studies with T1 outcomes, 12 positive findings out of 20 studies with T2 outcomes and 10 positive findings out of 22 studies with T3 outcomes (table 1).

Table 1

Controlled studies with qualitative results by study features where the educational feature is part of the intervention (rather than the control)

Among educational modes, the percentage of studies with positive T2 or T3 outcomes was comparable among simulation (59%), video (63%) and lectures (67%). Studies including live demonstration (50% positive), hands-on approach (58%), computers (50%) or real patients (40%) had moderate frequencies of positive T2 or T3 outcomes. Among instructional features, studies involving competency-based approaches (70% positive) had higher proportions of positive T2 or T3 outcomes than repetition (58%) or feedback (57%). Because studies did not generally compare educational modes to each other, there were no direct comparisons. Studies with more than one additional educational mode (eg, lecture plus video as part of the intervention; 60% positive) had more positive T2 or T3 outcomes compared with studies with a single educational mode (44%). Also, 11 of 44 RCTs (25%) had positive patient outcomes compared with 8 of 34 (24%) NRCTs. Studies with generalists (eg, internists) as trainers (38%) had lower proportions of positive T2 or T3 outcomes compared with studies of specialists as trainers (77%).

Across study types, we found a comparable representation of positive outcomes in controlled as in non-controlled studies. The same was true across types of procedures (see online supplementary appendix table 4).

Among RCTs, bias due to incomplete or absence of rater blinding was frequent but other sources of bias were low (see online supplementary appendix table 5). Among NRCTs, the source of non-equivalent control groups was problematic for one-third of studies as well as the lack of controlling for other factors (two-thirds of studies). The MERSQI assessment for all studies confirmed that most studies took place at a single institution and also indicated that data collection instruments frequently lacked basic validity evidence.

Quantitative analysis of controlled studies

A subset of controlled studies had sufficient reporting and similar metrics to be amenable to meta-analysis. Seven studies shared the continuous variable of checklist scores, and the summary SMD was 1.25 (95% CI 0.33 to 2.17). For the presence of simulation as part of the intervention, the summary SMD across five studies was significant at 1.54 (95% CI 0.32 to 2.75) compared with no simulation (SMD=0.55; 95% CI −1.23 to 2.32) for the two studies without (p=0.013) (figure 2A).

Figure 2

Meta-analysis of controlled studies with continuous outcomes and checklist scores (A) by educational approach and (B) competency-based approaches.

Similarly, the SMD was 3.17 (95% CI 2.55 to 3.78) for the one study using competency-based approaches compared with 0.89 (95% CI 0.19 to 1.59) for the six studies without (p<0.001) (figure 2B).

Among other characteristics of instruction, hands-on (p=0.024), video (p=0.009) and repetition (p=0.034) were effective (see online supplementary appendix table 6). The heterogeneity of the results among the studies with continuous variables was substantial at 92%.

For categorical variables amenable to meta-analysis, the pooled RR, signifying the increased likelihood of success among studies, was 1.13 (95% CI 1.09 to 1.17) for the 15 studies examining success rates. The RR was 1.27 (95% CI 1.08 to 1.50) for the 10 studies including simulation compared with 1.13 (95% CI 0.98 to 1.30) for the five studies without simulation as an added intervention (p=0.28 across studies). The RR was 1.38 (95% CI 1.13 to 1.68) for the eight studies relying on a competency-based approach compared with 1.09 (95% CI 0.96 to 1.24) for the seven studies without a competency-based approach (p=0.102). All other characteristics of instruction did not show a statistically significant benefit (see online supplementary appendix table 6). Heterogeneity among the studies with categorical variables was substantial at 72%.

Discussion

In our systematic review of bedside procedural instruction, we characterised a large body of literature and found it to be quite heterogeneous in quality. The research base was dominated by studies without control groups and study designs with significant risk of bias. Nevertheless, our review of procedural training approaches reveals the common use of simulation both as a training and assessment tool. Among other important insights into how procedural research has been, and by extension should be, conducted, our work shows that the evidence is strongest for the use of simulation and competency-based teaching paradigms in teaching procedures effectively. To our knowledge, this is the first systematic review examining studies across a variety of non-surgical procedures, specialties and teaching approaches.

Our finding that the existing corpus of literature in procedural instruction is heavily populated by studies with limited generalisability is similar to that of research investigating educational practices as a whole.182 Obstacles to attaining a higher level of rigour in education include an undue reliance on pre–post designs, small sample sizes, lack of funding or infrastructure to support multicentred collaborations, and data collection instruments without demonstrated validity. Additionally, many studies in medical training are designed to rationalise the use of additional educational interventions (eg, lecture vs lecture plus simulation) rather than to clarify how they should be used (eg, structured vs unstructured feedback);183 the latter approach is preferable in contributing to the science of learning. Experimentation in the clinical arena raises challenges above and beyond other educational settings; learners cannot be physically confined to a single-study group, and withholding training that has the potential to reduce patient harm is difficult to justify. That said, several well-conducted studies have been exemplars, accomplishing this end with clinically significant results.59 ,89 ,97 Thus, our characterisation of the current literature suggests that non-controlled studies, particularly in simulation, have reached a saturation point.

The instructional approaches we identified as effective support general educational principles.184 Adult learners seek relevant, practical experiences that require active participation, and they benefit from reflective practices. Hands-on approaches offer learners physical training in performing procedures and opportunities to rehearse these skills in context without the risk of patient harm. Goal orientation, also a characteristic of adult learning, is satisfied by competency-based instruction, which enforces high standards of training rather than assuming that everyone who is taught learns the skill. Similarly, repetition is considered essential to psychomotor learning as a means to attain muscle memory.185–188 Video provides the visualisation necessary to acquaint one's self with a procedure, as is included in many theories of technical skill acquisition.189 ,190 In sum, the approach that appears to be most effective in aligning technical training with the needs of adult learners is high-quality simulation that includes repetitive practice and mastery learning, supplemented by visual aids.

Even within a diverse literature base, we uncovered trends about how procedural education research is currently conducted. Checklists were quite common compared with global ratings, underscoring the utility of breaking down technical processes into discrete components for step-by-step training and for objective assessment. Of note, some evidence has emerged suggesting that global ratings may be more sensitive in determining competence in the procedural realm.191 Second, competency-based approaches, while both sound educationally and supported by evidence, were relatively uncommon. The additional training time, instrument development and planning required may have been prohibitive for some groups, but in the future we anticipate that mastery learning will become the reference norm for procedural instruction. We also confirmed that a minority of studies examined clinical outcomes such as complication rates; likely most investigations are underpowered for this metric due to constraints in sample size and low baseline complication rates. Lastly, we found that study outcomes were similar across procedures, attesting to the notion that procedural training is a generalisable concept. All procedures, whether simple or complex, involve a stepwise progression of technical steps for which knowledge of anatomy, familiarity with equipment and fluency of movement are equally important. This particular observation has important implications for policymakers and for future researchers, namely that successful strategies to improve procedural education may not differ by procedure.

Our findings are consistent with those of other systematic reviews that evaluated a subset of studies addressing procedural instruction. In particular, Cook et al9 ,192 ,193 found that mastery learning, simulation, repetition and feedback were associated with larger effect sizes. Our results showing poor reporting quality, a considerable proportion of studies with risk of biases, limited validity evidence are also echoed in previous work.194 Our work, however, focused on technical skills and procedures expected of the majority of physicians, extended beyond simulation (the focus of previous systematic reviews), and our instructional design features differ in categorisation and breadth.

The literature had several important limitations. Among the procedures required of generalists, there was a preponderance of certain procedures (ie, intubation and CVC) and not others (eg, paracentesis). Subject types varied widely, ranging from medical students to practising physicians, with mixes of specialties. Many studies still compare different educational modes (eg, simulation vs lecture), despite the fact that such comparisons are fundamentally confounded183 and do not elucidate whether certain approaches are better than others.16 Power calculations within studies were largely absent. Due to incomplete reporting, we dichotomised research outcomes as ‘positive’ or ‘negative’ as defined by study authors, but this approach resulted in inevitable loss of data precision. Poor reporting quality was also widespread in regards to lack of documentation about the training process and even about the subjects themselves. Fundamentally because of publication bias, our results demonstrate not what is commonly used to train physicians in procedures, but instead what is commonly reported. Of note, a funnel plot was not possible due to the lack of standardised effect estimates across all studies. Also, included studies did not examine external factors that may influence the choice of procedural instruction, such as regulatory needs, institutional infrastructure and resource availability.

Generalists are still expected to know how to perform invasive procedures; however, with the increasing specialisation of medicine, their volume of experience continues to decrease.195 Given the accumulating data on procedural training and simulation, we can begin to identify best practices for procedural teaching. Considering our results and that of other systematic reviews, we propose the following insights for educators and researchers (box 1).

Box 1

Recommendations for teaching and studying bedside invasive procedures

  • Use simulation where available.

  • Teach procedures using competency-based methods such as mastery learning and deliberate practice.

  • The maturity of the field suggests that randomised controlled trials in procedural instruction should be chosen over weaker study designs (eg, pre-post designs).

  • Future research efforts should examine differences among instructional methods (eg, step-by-step feedback vs unguided coaching) rather than among educational modes (eg, video vs lecture).

The findings of this review will guide faculty and leaders who are responsible for training and maintaining the procedural skills of physicians and trainees and should serve as a roadmap for future research.

References

View Abstract

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Correction notice This article has been corrected since it was published Online First. GWS's institutional attribution has been corrected.

  • Contributors GCH contributed to conception of the study design, data acquisition, analysis, interpretation, writing of the manuscript and is responsible for the overall content as guarantor. JM contributed to data acquisition, interpretation and analysis and critical revision of the manuscript. EB contributed to data interpretation and analysis and writing of the manuscript. JR contributed to data acquisition, interpretation and critical revision of the manuscript. CSS contributed to data acquisition, interpretation and critical revision of the manuscript. JW contributed to data acquisition and writing of the manuscript. LN contributed to data acquisition, analysis and critical revision of the manuscript. GS contributed to conception of the study design, obtaining of funding, data acquisition, analysis, interpretation, and writing of the manuscript.

  • Funding Picker Foundation.

  • Competing interests None declared.

  • Ethics approval Ethics approval was waived by the Institutional Review Board at Beth Israel Deaconess because this work did not represent human subjects research.

  • Provenance and peer review Not commissioned; externally peer reviewed.