Article Text

PDF

Simulations in the United States Medical Licensing Examination™ (USMLE™)
  1. G F Dillon1,
  2. J R Boulet2,
  3. R E Hawkins1,
  4. D B Swanson1
  1. 1National Board of Medical Examiners, Philadelphia, Pennsylvania, USA
  2. 2Educational Commission for Foreign Medical Graduates, Philadelphia, Pennsylvania, USA
  1. Correspondence to:
 Gerard F Dillon PhD
 3750 Market St, Philadelphia, PA, 19104, USA; gdillonnbme.org

Abstract

Over the last several years there has been much attention focused on the detection and remediation of problems that pose potential threats to patient safety and that interfere with the provision of effective care. It has been noted that changes in medical education and assessment are integral to eventual improvement in this area. Within the assessment system used to licence physicians in the United States, there has been an evolution of assessment formats intended to improve the measurement of knowledge and skills, including the recent development of computer based patient simulations and clinical skills assessments. A number of new testing formats intended to further enhance assessment of critical knowledge and skills will be available in the near future.

  • CCS, computer based case simulation
  • CK, clinical knowledge
  • CS, clinical skills
  • ECFMG, Educational Commission for Foreign Medical Graduates
  • IOM, Institutes of Medicine
  • MCC, Medical Council of Canada
  • MCQs, multiple choice questions
  • NBME, National Board of Medical Examiners
  • PMPs, patient management problems
  • SPs, standardised patients
  • USMLE, United States Medical Licensing Examination
  • medical licensing
  • physician assessment
  • simulations
  • standardised patient
  • CCS, computer based case simulation
  • CK, clinical knowledge
  • CS, clinical skills
  • ECFMG, Educational Commission for Foreign Medical Graduates
  • IOM, Institutes of Medicine
  • MCC, Medical Council of Canada
  • MCQs, multiple choice questions
  • NBME, National Board of Medical Examiners
  • PMPs, patient management problems
  • SPs, standardised patients
  • USMLE, United States Medical Licensing Examination
  • medical licensing
  • physician assessment
  • simulations
  • standardised patient

Statistics from Altmetric.com

Although the specific context and processes will vary, there are a number of organisations and groups around the world that have a similar responsibility: to determine whether physicians have the skills and knowledge requisite for the safe and effective care of patients. This determination is typically aided by the results of an assessment programme, the extent and complexity of which will vary depending upon the nature of the knowledge and skills to be assessed, upon the psychometric challenges that such assessments bring, or upon a variety of economic, political, legal, ethical, and other issues. To the benefit of these efforts, advances in technology and measurement science have allowed more precise measurement of areas already addressed, and have permitted the assessment of areas not previously addressed because of practical or psychometric difficulties. These advances are particularly useful as organisations involved in such assessments turn their attention to issues identified by the medical community as critical to patient safety. Examples of such issues can be seen in the recent series of reports by the Institutes of Medicine (IOM), which have highlighted fundamental problems in the healthcare system. These reports suggest that the cited problems threaten the quality of patient care in the United States.1,2 Recognising the impact that licensing examinations have on influencing perceptions about the critical skills and knowledge that are necessary to combat threats to the quality of healthcare, the IOM reports also include specific recommendations for the licensing examination system.3

From a test design perspective, some of the IOM recommendations present implementation challenges. For example, the IOM recommended focus on medical teams seems out of place in the context of a medical licensing system aimed at individual clinicians. The primary focus of the medical licensing examination is on an individual’s proficiency at playing a defined role on that team. Nevertheless, the idea of “team” can become an important aspect of the medical context presented in test material, especially in the newer performance assessments that have become an important component in modern testing. Such assessment formats can create a sense of team through carefully scripted scenarios in the unfolding simulations and through the requirements of effective communication of findings after clinical encounters.

This paper is intended to provide a review of the examination system currently used as part of the requirements for a medical licence in the United States and, within this context, to describe attempts to assess the knowledge and skills that might reduce the likelihood of problems in healthcare delivery. Relatively new assessment formats are reviewed, including those intended to simulate a real physician–patient encounter. The evolution and future direction of such formats are discussed.

THE US MEDICAL LICENSING SYSTEM

A licence to practice medicine in the United States is granted by the licensing authority for the state in which the individual intends to practice. The licence is for the general practice of medicine and does not restrict the individual physician in terms of area of specialisation. In the process of granting the initial medical licence the state requires that the individual meet certain educational and assessment requirements. For medical students trained in the United States or Canada, the educational requirement is met by graduation from an accredited medical school. Students trained outside the United States or Canada must obtain certification by the Educational Commission for Foreign Medical Graduates (ECFMG) which, in part, requires documentation of undergraduate medical training.

Key messages

  1. Physician assessment is a critical feature in the process intended to determine readiness for independent practice.

  2. Recent reports on medical errors recognise the role that assessing physician readiness can have in reinforcing the skills and knowledge important to reducing these problems.

  3. Testing formats currently exist that address a variety of skills and knowledge critical to the safe and effective practice of medicine.

  4. The challenges to developing reasonable and useful testing formats, though considerable, continue to be successfully addressed by the measurement community.

  5. Advances in measurement technology and science can and will allow the measurement of important skills and knowledge that were not previously addressed.

For students receiving the MD degree from a US or Canadian medical school and for all students trained outside of the United States, the assessment requirement for a medical licence is currently met by successful completion of the United States Medical Licensing Examination (USMLE). The USMLE programme is jointly sponsored by the National Board of Medical Examiners (NBME) and by the Federation of State Medical Boards (FSMB). It was first administered in 1992, representing, at that time, a replacement for the multiple examination pathways previously available to licence allopathic physicians.

USMLE is intended to assess a physician’s ability to apply knowledge, concepts, and principles, and to demonstrate fundamental patient centred skills, that are important in health and disease and that constitute the basis of safe and effective patient care. There are three components, or “Steps,” in the USMLE examination programme.4

  • Step 1 focuses on the concepts of science basic to the practice of medicine, with special emphasis on principles and mechanisms underlying health, disease, and modes of therapy. Step 1 is a one day, computer delivered examination, made up of multiple choice questions (MCQs).

  • Step 2 has two components. The first, known as the clinical knowledge (CK) examination, is a one day, computerised MCQ test intended to assess whether the individual possesses the medical knowledge and understanding of clinical science considered essential for the provision of patient care under supervision. The second component of Step 2 is the clinical skills (CS) component. Step 2 CS is a standardised, patient based examination intended to directly assess the examinee’s data gathering and communications skills. Step 2 CS is described in more detail in a subsequent section.

  • Step 3 is a two day examination combining MCQs and computer based case simulations (CCSs). It is intended to assess whether the individual can apply medical knowledge and understanding of biomedical and clinical science essential for the unsupervised practice of medicine. The CCS format will also be the focus of discussion in a later section.

In the short history of USMLE, and in the longer history of testing in the medical licensing system generally, there have been numerous attempts to develop assessment formats that provide more authentic representations of patient care. These efforts, in part, reflect dissatisfaction with assessing physician readiness solely based upon performance in MCQ examinations, but the new formats are not without their own set of challenges.

HISTORY OF SIMULATION USE ON MEDICAL LICENSING EXAMINATIONS

In this context, the term “simulation” is being used in a relatively broad sense. It represents any attempt to reproduce, during testing, relevant aspects of the medical practice environment. This reproduction can vary from the replication of a few isolated features of a clinical task to the recreation of the entire clinical context. There are numerous examples of such attempts, going back to the beginnings of the examination systems used to support medical licensure.

The first NBME examinations, given in 1916, lasted several days and incorporated essay, laboratory, oral, practical, and bedside components.5 As an example, the practical examination in surgery required examinees to suture together two segments of a dog intestine. Performance was assessed by determining if the sutures could withstand a prescribed level of water pressure.

In 1922, the NBME’s examination programme was restructured. The first two components required for NBME certification assessed understanding of the basic biomedical sciences and the fundamentals of clinical medicine, both through essay questions. The final component included observed patient encounters followed by oral examinations regarding those encounters. This structure persisted until the late 1950s, when studies of the bedside oral exam clearly documented its psychometric inadequacies. Scores were found to provide more information about the examiner than about the examinee. Agreement between examiners observing examinees with different patients was at near-chance levels,6 reflecting both differences in examiner stringency and variation in the quality of an examinee’s performance from one clinical situation to another. High costs and logistical difficulties in administering this type of examination for the rapidly growing post-war cohort of medical students also contributed to its demise.5 Over the next two decades, test developers experimented with a number of assessment methods, all intended to recover the assessment of clinical skills that was lost with the elimination of observed clinical encounters. Initially, motion pictures of clinical encounters were projected to examinees, who answered MCQs about the portrayed encounter. The key problem with this format was logistical: it was difficult to standardise the test administration conditions for projection of films.5 During the late 1960s, in an effort to pose more realistic challenges to medical decision making skills, patient management problems (PMPs) were introduced. These multi-step problems began with an “opening scenario” that provided a brief description of a patient care situation. The examinee then proceeded through a series of “scenes” in which additional information was gathered (history taking, physical examination, and laboratory scenes), followed by one or more scenes in which patient management activities were initiated. “Latent image” pens were used to select actions and reveal feedback summarising the consequences of the actions. PMPs were commonly used on US and Canadian medical licensing and certification examinations until the late 1980s, when the following problems led to their elimination.7 Like bedside oral examinations, relatively small numbers of PMPs were included on the examination, and the small sample of cases resulted in relatively unreliable scores. Additionally, in many clinical situations, a broad range of patient management strategies are possible, and it was sometimes difficult to develop scoring keys that appropriately rewarded alternate strategies that were similar in quality. Variation in examinees’ response style was also problematic; scoring keys tended to reward the thorough examinee and penalise the efficient examinee, leading to scores that, in part, reflected examinees’ propensity to take action in an uncertain situation, rather than just their clinical decision making skills.8

CURRENT USE OF SIMULATIONS ON USMLE

Simulations vary considerably in fidelity: one could view MCQs as simulations (at least if they begin with a patient description and require examinees to indicate a clinical decision) at the low end of the fidelity continuum, providing an assessment of examinees’ proficiency in applying their knowledge to descriptions of case situations. Assessments using standardised patients (SPs) lie at the other end of the fidelity continuum, providing a realistic context for measuring the skills involved in taking a history and performing a physical examination. Computer based case simulations fall in between. This section provides an overview of how each is currently used in USMLE.

Patient based MCQs

Patient based MCQs have been used for decades on licensing examinations; they have appeared in all three steps since the introduction of USMLE. On Step 1, these take the form of brief descriptions of patient care situations, followed by questions challenging examinees to use their understanding of basic biomedical science to explain or predict patient findings.9 Roughly 60% of the items on Step 1 currently take this form. In Step 2, virtually all items begin with a description of a clinical situation; the patient presentations are longer and less prototypic than the patient descriptions on Step 1. Examinees must differentiate important from incidental findings and indicate a clinical decision, generally a diagnosis or the next step in patient care.10 Step 3 MCQs also provide detailed descriptions of physician–patient encounters. Items are organised by the encounter setting (for example, office or emergency department) and are often presented in sets termed “case clusters” in which a series of MCQs address different facets of an unfolding clinical situation.11

From a content sampling perspective, because relatively large numbers of MCQs can be administered per hour of testing time, MCQs on all three USMLE steps provide an efficient method for assessment of decision making skills. The degree of fidelity depends, in part, upon the length of the patient description, the level of detail provided, and the extent to which patient findings are provided in an interpreted versus an undigested format.12 In the near future, it is likely that the fidelity of MCQs on USMLE will increase, as all three steps take advantage of computer based test administration to incorporate multimedia into item “stems,” thus enriching patient presentations.

Computer based case simulations

Three decades of research effort eventually culminated in the inclusion of uncued, interactive CCS on USMLE Step 3 in 1999.13 In CCS, the examinee is presented with a brief description of a patient, including a chief complaint and a brief history.11,14 From that point forward, the case unfolds as the examinee works up and manages the computer simulated patient, obtaining diagnostic information, ordering therapeutic interventions, and monitoring patient progress. Any of several thousand diagnostic and therapeutic manoeuvres can be requested by the examinee in free text on an “order sheet.” As simulated time passes, the patient’s condition changes based on the underlying medical problem and the examinee’s interventions; results of tests are reported and the impact of interventions must be monitored. Examinees are scored on CCS using an algorithm that essentially compares their patient management strategies with policies obtained from experienced clinicians.15 Examinees must balance thoroughness, efficiency, timeliness, and avoidance of risk in responding to clinical situations, with dangerous and unnecessary actions lowering scores. Although CCS cases have proven expensive to develop, administer, and score, psychometric analyses have indicated that CCS cases measure some of the management knowledge and skills that are not easily addressed by MCQs, and they do so with a reasonable degree of precision.16,17

Clinical skills assessment using standardised patients

In mid 2004, a standardised patient based clinical skills examination was added as a component of Step 2 of USMLE. Developed in collaboration with the ECFMG the purpose of this component, called Step 2 CS, is to ensure the public that successful candidates for licensure are competent in the fundamental clinical skills required for safe and effective patient care. These clinical skills include taking a relevant medical history, performing an appropriate physical examination, communicating effectively with the patient, clearly and accurately documenting the findings and diagnostic hypotheses from the clinical encounter, and listing appropriate initial diagnostic studies.

The examination consists of 12 encounters with SPs portraying common and important clinical problems. History taking questions and physical examination manoeuvres are recorded by SPs using case specific dichotomous checklists that are completed after each encounter. Communication and interpersonal skills, and spoken English proficiency, are evaluated by SPs using generic rating scales. The patient note, a clinical summary recorded after the encounter, is scored by physician raters using holistic methods. Examinees are assessed on three subcomponents: Integrated Clinical Encounter (ICE), which includes data gathering (history taking and physical examination) and the patient note; Communication and Interpersonal Skills (CIS); and Spoken English Proficiency (SEP). Examinees must pass all three subcomponents in order to pass Step 2 CS overall.

Students graduating from MD granting US and Canadian medical schools in 2005 or later are required to take USMLE Step 2 CS as part of the requirements for licensure in all United States jurisdictions. USMLE Step 2 CS replaces the Clinical Skills Assessment (CSA®) of the ECFMG, as an essential requirement for ECFMG certification and licensure for graduates of medical schools located outside of the US and Canada, who are seeking postgraduate training opportunities in the United States.

Implementation of USMLE Step 2 CS follows the introduction of similar SP based examinations for high stakes decision making in other countries. The Medical Council of Canada (MCC) has included a multi-station SP based examination as part of the MCC Qualifying Examination for licensure since 1993.18,19 The General Medical Council of the United Kingdom has included a SP based assessment as one component of the Professional Linguistics and Assessment Board (PLAB) examination since 1998.20 The above mentioned ECFMG CSA also has been in place since 1998.21 The introduction of SP based examinations for licensure stems from two important and converging concepts: firstly, a growing body of evidence and experience supporting the value of SP based methods for assessing clinical skills, and secondly, a re-emerging acknowledgment that these patient centred skills are critical to safe and effective patient care.

High stakes clinical skills examinations using SPs are developed and administered in a manner that is based upon and supported by approximately four decades of research and experience. Indeed, a significant part of the research into this particular form of simulation was conducted by the organisations (NBME and ECFMG) collaborating in USMLE Step 2 CS development. Similar to other forms of simulation, the use of SPs for assessment offers clear benefits, including more certain availability of relevant test material during examinations, absent risk to real patients, an environment and set of examinee tasks that appear more authentic relative to the attributes assessed, and decreased cost compared to the use of clinician raters.

Numerous studies suggest that the use of SPs allows for the valid assessment of basic clinic skills.22–26 Careful attention to SP recruiting and training approaches and the use of process checklists and rating scales for scoring examinee performance enhance the reliability of such assessments.22,27–29 As examinee performance varies substantially across clinical content areas, the inclusion of a broad sample of cases is necessary to optimise reliable assessment of clinical skills.22,30 For this reason examinee–SP encounters are typically arranged in a series of stations; examinees are required to demonstrate one or more of the skills of interest in response to varying patient presentations.

It is clear that the clinical skills assessed by USMLE Step 2 CS are essential to safe and effective patient care. The medical history and physical examination contribute important information to patient diagnosis and management.31,32 Adequate communication and interpersonal skills are associated with enhanced patient satisfaction and improved clinical outcomes, as well as decreased risk for malpractice litigation.33–36 Poor written documentation in the medical record has been identified in ambulatory and inpatient settings as having potential implications for healthcare quality.37–39 Enhancements planned or under consideration for inclusion in Step 2 CS will allow for an even more robust assessment of physical examination skills, without posing safety risks to SPs.

The USMLE, in serving the licensure process, targets assessment at the individual examinee, assuring the public that successful candidates, regardless of education and experience, have met a national standard for performance in those domains assessed by its three Steps. The inclusion of simulation methods, SPs in Step 2, and CCS in Step 3, allows for a richer assessment of patient centred skills and management approaches that are critical to patient care. To the degree that USMLE can determine that a physician-to-be may lack these skills at a very fundamental level, the potential impact for patient safety seems apparent. While the USMLE does not measure the performance of teams per se, attributes that allow examinees to function effectively on a healthcare team are included in the examination. For example, adequate verbal and written communication skills, assessed in Step 2 CS, are essential components of effective teamwork (although the extent to which communication skills exercised within the doctor–patient encounter generalise to effective communication within healthcare teams remains to be established). And of course the inclusion of a patient centred component in the licensure examination underscores the purpose for which healthcare teams are constructed—to provide beneficial care for patients.

FUTURE OF SIMULATIONS FOR MEDICAL LICENSING

It is likely that technological and educational advances will lead to the enhancement of testing formats and scoring modalities already used in USMLE. For example, many medical schools have incorporated computer based training modules, part task trainers that focus on specific procedures and body parts (for example, breast models, pelvic models) and full scale integrated simulators for training purposes.40,41 Despite the fact that their use has been primarily for summative assessment purposes and has been somewhat limited due to cost and the lack of efficient, reliable, and valid scoring rubrics, these formats will likely play a role in examination systems for licensure. Like the CCS component of Step 3 and the CS part of Step 2, once there is evidence to support the use of scores from these formats, these new simulation modalities will certainly be embraced.

While there are a number of new simulation modalities that may be applicable for high stakes licensing examinations, the most likely developments in the near term will relate to the further refinement of current methods, including scoring systems. For CCS, a tremendous amount of research was completed to develop and validate the scoring systems.17,42–46 Even so, new models (for example, neural networks) may eventually lead to even more reproducible scores and more valid assessment decisions. For Step 2 CS (standardised patients), the introduction of different types of cases (for example, telephone consultation, focused counselling) could enhance the content validity of the assessment. In addition, provided that the psychometric adequacy of the scores is shown to be sufficient, future cases could be constructed so that team dynamics and communication could be assessed. Currently, doctor–patient communication skills are evaluated as part of Step 2 CS; an extension to doctor–nurse communication, or other relevant teams, although complicated, is certainly possible. Part task trainers would also seem to be an appropriate add on. Here, select physical examination skills (for example, breast examination, pelvic examination) that are difficult to measure in SP based assessments could be incorporated in some of the stations.47,48 This addition to Step 2 CS would augment the assessment of technical proficiency in physical examination performance, allowing examinees the opportunity to identify important abnormalities. In terms of scoring simulation based assessments, there are definitely some enhancements and modifications that could be considered. For example, the approach to statistically modelling expert scorers, used for CCS, or some variant, may be applicable to score the written clinical summaries (patient notes) that are produced by examinees following each of their encounters with the SP. These notes, by design, are intended to measure written communication with the healthcare team, and are an integral part of Step 2 CS.49

There are also a number of new technologies that could eventually find their way into high stakes medical licensure examinations. Virtual reality and haptic feedback trainers have been used to train physicians in areas such as minimally invasive surgical procedures and vascular interventions.50 Unfortunately, the cost for these systems can be high, especially if they are to be used for large scale assessments where many thousands of examinees must be tested. More importantly, these technologies generally target very specialised skills. From a certification or licensure perspective, where the focus centres on measuring fundamental skills as reliably as possible, it may be inefficient and impractical to use these systems. Additional research will be necessary before these types of trainers can be incorporated in high stakes summative assessments.

Life size mannequins (integrated simulators) with realistic airway and cardiovascular attributes have been used to train physicians and other healthcare professionals. For medical licensing examinations, these simulators suffer from some of the same drawbacks as virtual reality and haptic feedback trainers. They are costly and generally target more specialised skills. Moreover, while a number of scoring systems have been developed,51,52 they have yet to undergo the scientific scrutiny that has taken place for SP assessments.53 Nevertheless, as the cost of these mannequins declines, and additional psychometric studies are completed, they could have a unique role within the licensure process, especially for higher order skills. Firstly, mannequins can be used to model rare events, especially those where medical errors would not be reversible if a “real” patient were being managed. This is important in that some clinical skills are difficult, if not impossible, to measure, even with well trained SPs. Secondly, since real time responses to therapeutic interventions can be modelled, the management of patient conditions such as drug interventions can be assessed. Thirdly, it is possible to develop scoring systems that are based on measurable patient outcomes, something that can be difficult to do, at least in a consistent way, with SP based assessments. Finally, for healthcare teams (for example, trauma) it is possible to assess joint patient care efforts, including multidisciplinary communication skills, in a standardised environment. If test content considerations, logistics, and scoring issues can be addressed, the use of integrated simulators as part of licensure examinations may be forthcoming.

REFERENCES

View Abstract

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.