Article Text

Uncharted territory: measuring costs of diagnostic errors outside the medical record
  1. Alan Schwartz1,
  2. Saul J Weiner2,
  3. Frances Weaver3,
  4. Rachel Yudkowsky1,
  5. Gunjan Sharma4,
  6. Amy Binns-Calvey1,
  7. Ben Preyss5,
  8. Neil Jordan6,2
  1. 1Department of Medical Education, University of Illinois, Chicago, Illinois, USA
  2. 2Center for the Management of Complex Chronic Care, US Department of Veteran Affairs, University of Illinois, Chicago, Illinois, USA
  3. 3Center for the Management of Complex Chronic Care, US Department of Veteran Affairs, Stritch School of Medicine, Loyola University, Chicago, Illinois, USA
  4. 4Department of Medicine, University of Illinois, Chicago, Illinois, USA
  5. 5Department of Family Medicine, Northwestern University, Chicago, Illinois, USA
  6. 6Department of Psychiatry & Behavioral Sciences, Institute for Healthcare Studies and Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
  1. Correspondence to Dr Alan Schwartz, Department of Medical Education, University of Illinois, 808 S Wood St, 986 CME Mail Code 591, Chicago, IL 60612, USA; alansz{at}


Context In a past study using unannounced standardised patients (USPs), substantial rates of diagnostic and treatment errors were documented among internists. Because the authors know the correct disposition of these encounters and obtained the physicians' notes, they can identify necessary treatment that was not provided and unnecessary treatment. They can also discern which errors can be identified exclusively from a review of the medical records.

Objective To estimate the avoidable direct costs incurred by physicians making errors in our previous study.

Design In the study, USPs visited 111 internal medicine attending physicians. They presented variants of four previously validated cases that jointly manipulate the presence or absence of contextual and biomedical factors that could lead to errors in management if overlooked. For example, in a patient with worsening asthma symptoms, a complicating biomedical factor was the presence of reflux disease and a complicating contextual factor was inability to afford the currently prescribed inhaler. Costs of missed or unnecessary services were computed using Medicare cost-based reimbursement data.

Setting Fourteen practice locations, including two academic clinics, two community-based primary care networks with multiple sites, a core safety net provider, and three Veteran Administration government facilities.

Main outcome measures Contribution of errors to costs of care.

Results Overall, errors in care resulted in predicted costs of approximately $174 000 across 399 visits, of which only $8745 was discernible from a review of the medical records alone (without knowledge of the correct diagnoses). The median cost of error per visit with an incorrect care plan differed by case and by presentation variant within case.

Conclusions Chart reviews alone underestimate costs of care because they typically reflect appropriate treatment decisions conditional on (potentially erroneous) diagnoses. Important information about patient context is often entirely missing from medical records. Experimental methods, including the use of USPs, reveal the substantial costs of these errors.

  • Decision making
  • evidence-based medicine
  • health professions education
  • cognitive biases
  • diagnostic errors
  • comparative effectiveness research
  • cost effectiveness
  • health policy
  • health services research
  • mental health

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Assessing how physicians perform in practice is challenging.1 Healthcare providers are conventionally evaluated through chart reviews and patient-completed satisfaction surveys, which have significant limitations.2 ,3 Neither tell us when physicians overlook important information during the encounter, resulting in unnecessary healthcare costs due to treatment for another, incorrect diagnosis, or failure to treat a missed diagnosis.

Sending standardised patients into clinical practice settings incognito has been proposed as the ‘gold standard’ measure of physician performance.4 Actors are trained to present with complaints indicative of significant conditions. Providers are then assessed on how they respond according to evidence-based guidelines. These ‘unannounced standardised patients’ (USPs) have a major advantage over regular standardised patients in that the physician is not aware that he is being assessed, and the evaluation occurs in the physician's usual practice setting. As a result, USPs are periodically employed as a comparison reference for other methods of performance assessment.5–8

We recently published the largest study to date using USPs.9 In that study, we found that board-certified internists studied in their office-based practices had difficulty adapting care to individual patient context, a phenomenon we have termed ‘contextual error’.10 ,11 Our USPs presented scripts with specific ‘red flags’ designed to suggest contextual and biomedical confounders to a typical clinical presentation. Our study identified a high incidence of performance problems in appropriately attending to these red flags.

When a physician misses red flags and forms an incorrect diagnosis, it is the incorrect diagnosis and its treatment that appears in the medical record. When this occurs, an assessment of the physician's management that relies on the medical record will be based on an incorrect assumption about the actual diagnosis. For instance, if a physician misses the diagnosis of hypothyroidism in a patient presenting with weight gain and constipation, the medical record will only record how the physician managed these two conditions. Contextual information, in particular, was unlikely to be elicited by the physician, and even more unlikely to be noted in the medical record. For example, one of our study cases was a patient presenting with worsening asthma as a result of inability to afford his daily brand-name inhaler due to losing his job. Physicians who failed to attend to the red flags about job loss in this case treated the patient by increasing the dosage (and associated cost) of the medication the patient already could not afford. The medical record, however, would reflect a patient with worsening asthma who had been (apparently appropriately) prescribed a more potent maintenance medicine. The cost of such a misdiagnosis would not become apparent unless the patient returned (possibly in status asthmaticus) to the same facility and a more astute interviewer asked about how he uses his inhaler and why.

If the patient had visited a more astute physician at the start, the medical record would have reflected a discussion about insurance problems leading to difficulty accessing prescribed medications and (appropriately) addressed by prescribing a generic inhaler.

In short, our study provides an opportunity to estimate the costs of errors that would not be appreciated by routine review of the medical record. Given the apparent frequency of these errors in response to our cases, these costs may be significant. In this paper, we describe and characterise those costs.


Study design and participants

A full description of the study methods appears in our report of the rate of contextual errors.9 Briefly, we obtained data on 111 primary care internal medicine physicians in practice at 14 locations in the Midwest between April 2007 and April 2009. Each physician was visited by up to four different USPs (399 visits), presenting randomly selected counterbalanced combinations of variants of four cases combined in a partial factorial design with the presence or absence of a biomedically complicated or contextually complicated diagnosis. In every variant, USPs included biomedical and contextual red flags in their presentations; variants differed by whether USPs confirmed or denied findings suggested by the red flags if they were probed by the physician. Table 1, reprinted from that study, describes the cases and variants, and the online appendix presents the full details of the cases. For example, in the asthma case, a middle-aged man presents with inadequately controlled symptoms on his current medication regimen, a brand-name steroid inhaler. In the baseline variant, the physician should increase the patient's steroid dosage or add another maintenance drug. In the biomedical variant, the patient's symptoms are the result of gastro-oesophageal reflux disease (GERD), and the physician should instead prescribe treatment for GERD. In the contextual variant, the patient cannot afford his inhaler and is not using it daily, and the physician should instead prescribe a less expensive generic inhaler.

Table 1

Cases and variants (from Weiner et al,9 reprinted with permission)*

A characteristic of the case design is that the baseline variant of each case represented the diagnostically uncomplicated version of that case, so that the focus of assessment is on the appropriateness of the management plan. Because the diagnosis was not in question, in the baseline variant, a chart review would reveal inappropriate care when evidence-based practices were not followed. For instance, in the asthma case, the physician was assessed based on whether guidelines were followed for stepping up medication therapy, a determination that could be made by reviewing the orders placed in the medical record. In contrast, physician errors made in the biomedical, contextual and bio/contextual variants of cases were not detectable from a review of the medical record as they occurred when physicians failed to recognise that a clinical situation was complicated (eg, the patient's symptoms were due to GERD or inability to afford his inhaler) and treated the patients as if they were presenting with the baseline variant.

Visits were audiorecorded and transcribed. USPs were entered into the scheduling system as actual patients, and copies of the physicians' notes following encounters were forwarded to the project team. Physicians' notes were coded for treatments and tests ordered blind to any knowledge of the case variant. Inter-rater reliability checks were performed for samples of all ratings. The costs of visits were paid by the study. Institutional Review Boards associated with all practice sites approved the protocol.

Cost estimates

We adopted the economic perspective of the patient and their third party payer, if any, with a time horizon of the expected consequences of care during the 30 days following the consultation. We considered only the direct consequences of care associated with diagnosis or misdiagnosis. We did not consider downstream costs beyond the initial recommendations from the consultation, and we did not consider societal costs not incurred by the patient or payer, such as lost productivity. We included only resources related to the immediate diagnostic and therapeutic management of the four conditions (see table 2 for the list of resources we considered for the biomedical and contextual management of each case).

Table 2

Numbers of errors and bases for cost estimates by service missed or used unnecessarily in each variant

Resources were direct medical costs in the case of unnecessary treatment and foregone direct medical costs in the case of undertreatment. For biomedical variants, the decision of which resources to include was made on the basis of established clinical guidelines for the management of the relevant condition. For contextual variants, clinical experts provided the correct management plans and associated resources during the case development and validation (reported previously elsewhere).12

We estimated resource consumption by eliciting the over or under treatments from the chart. Table 2 lists how much of each resource was measured across the entire cohort. Unit costs were measured in 2009 US dollars, using 2009 Medicare reimbursements by Current Procedural Terminology code or average sales price for medications when available, and local costs (eg, for labour), based on the care plan recommended by the physician. Table 2 gives the costs we assigned for all the missed or unnecessary services associated with our study.

When an unnecessary test or treatment was ordered, we conservatively estimated the cost of error as the cost of the test or treatment procedure itself (eg, we did not consider potential downstream costs associated with false positives or downstream savings associated with a serendipitous identification of additional health problems). When an indicated necessary test or treatment was not ordered, we similarly made the conservative assumption that the cost of the procedure itself was equal to its expected marginal benefit, and used the procedure cost as the cost of error. We did not include costs associated with travel time, lost wages or other indirect costs. As a result, we likely underestimated the true total costs of errors.

For example, when a patient with worsening asthma due to inability to afford an expensive daily medication is mistakenly instructed to increase dosage of that medication rather than prescribed a lower-cost inhaler, we assume that the new expensive prescription will also not be filled, so the direct cost of unnecessary treatment is $0. However, the physician has missed the opportunity to provide the patient with a lower-cost inhaler and the patient loses the benefit associated with that inhaler. We assume the value of this benefit on average is at least the cost of the generic inhaler for 1 month, which was estimated at $163.79 from our Medicare data. That is, the rationale for providing an inhaler to an patient with asthma is that, on average, the inhaler will provide enough benefit to justify its cost, and thus provide at minimum $163.79 of expected benefit per patient over a 30-day horizon; we treat failure to provide this standard of care as incurring a cost of this amount, which serves as a proxy for the expected costs associated with unmanaged asthma over a population of patients (some of whom will experience no actual costs, some of whom will incur the cost of additional clinic visits, some of whom will incur the cost of emergency room visits, etc) Accordingly, the direct medical cost of error in this case is $163.79, which clearly represents a very conservative estimate.

We classified the care in each visit as appropriate (a correct plan of care with no unnecessary care), underuse (failure to order necessary care, but also no unnecessary care), overuse (ordering unnecessary care in addition to necessary care), and misuse (failure to provide necessary care while also providing unnecessary care). By definition, the cost of error for an appropriate plan was $0. Note that it is sometimes possible for overuse to have a cost of $0 when the ordered care is not actually provided (as in the asthma case, when the patient is required to fill a prescription and would likely not do so). Because physicians differed in their rate of planning correct care among case variants, we present the cost of error by variant across all encounters and across only those encounters in which care was not appropriate. Cost of errors in each of the variants was compared using Kruskal–Wallis one-way non-parametric analysis of variance with pairwise comparisons between variants.


Overall, errors in care resulted in predicted costs of approximately $174 000 across all 399 visits. Baseline errors, that is, those that could be detected via a chart review, accounted for only a fraction of error-related costs ($8745). This was, in part, because physicians were much less likely to make baseline errors: physicians provided inappropriate baseline care just 28% of the time, compared with error rates ranging from 61% to 93% for biomedically and/or contextually complex care.

The costs of errors also varied considerably by case variant. The median cost of error per visit with an incorrect care plan was $164 in the baseline variant, $30 in the biomedical variant, $231 in the contextual variant, and $224 in the biomedical/contextual variant (table 3). The cost of erroneous care was not significantly different between the contextual variant and the biomedical/contextual variant (Z=0.77, p=0.44), but was significantly higher in the contextual variant than in either the baseline variant (Z=2.85, p=0.004) or biomedical variant (Z=6.65, p<0.001). The cost of error was significantly higher in the biomedical than baseline variant (Z=2.33, p=0.02).

Table 3

Total and median costs of errors in care by variant (US$)

Table 4 illustrates the proportion of encounters in which care was appropriate (a correct plan of care with no unnecessary care), underuse of necessary care with no unnecessary care, overuse of unnecessary care in addition to necessary care, and misuse (failure to provide necessary care while also providing unnecessary care), along with median costs of errors. In our cases, costs of errors arose primarily through underuse—failure of the practitioner to order tests or treatments essential for management of the biomedically or contextually complex condition.

Table 4

Median cost (IQR) (US$) and frequency of errors by usage patterns


Errors associated with routine care can sometimes be detected in the medical record. For example, failure to treat hypertension when high blood pressure appears in the vitals in the record is an error that would be detected on chart review. In our study, these errors were infrequent, and when they occurred, they were generally not costly.

However, errors due to inattention to biomedical or contextual factors in the medical history are unlikely to be detected through medical record review, as the care will appear appropriate for an apparently routine case. These errors were more frequent in our study than errors that could be detected in the chart, and when they occurred, they were often costly. In our cases, the immediate costs of contextual errors were higher than those related to a failure to address biomedical symptoms, suggesting that a physician who is better at listening and contextualising care may have lower costs of error than a physician who is a biomedical expert. Our USP method reveals these errors lie in ‘uncharted territory’.

This study has several limitations. The specific costs and patterns of costs are case specific, and do not necessarily generalise to other clinical presentations. For instance, physicians who failed to identify nutritional deprivation in the patient with weight loss typically embarked on a costly malignancy work up. Had other scenarios been chosen, the rate of error and resulting costs would vary. Nevertheless, the biomedical and contextual issues selected for this study are all well documented common problems in large segments of the American population.13–16

In addition, we relied on cost to measure foregone benefit because the benefit could not be observed in cases in which beneficial treatment was missed. We assumed no moral hazard, as occurs when insured patients receive overpriced or unnecessary care.17 Given that we used clinical standards to define appropriate services and rates were set by Medicare, the distortion is likely to be minimal.18 Our approach also did not factor in the downstream cascade effects of inappropriate care beyond our 30-day horizon. Indeed, some errors may well be ‘cost saving’ in the short term because their impact on health and cost of care incurs months or years later. Without a lifetime horizon and models of each chronic disease condition, we could not incorporate these impacts. Had they been included, the cost of errors would likely have been substantially higher. Conversely, it is possible that some errors would be corrected at a subsequent visit, which is why we kept our time horizon short, at 30 days. The average visit interval in primary care has been reported as 4 months but varies considerably.19

Although it can be straightforward to track whether physicians are adhering to guidelines as a quality indicator when the diagnosis is not in question, it is not at all straightforward to determine whether they are appropriately individualising care when confounding medical or contextual complexities so dictate. This study demonstrates how broadening the assessment of physician performance to include this metric unmasks serious performance problems, resulting in undocumented errors that incur substantial unmeasured costs but are not perceptible through medical record review. Strategies to address the challenge of individualising clinical decisions through provider education and new measures of performance, including directly observed care, may be urgently warranted.


We thank Simon Auster, MD, JD, Uniformed Services, University of the Health Sciences, and Robert Kaestner, PhD, University of Illinois at Chicago for their critical input, particularly during the planning and implementation stages of our study.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:


  • Funding This study was supported by the Veteran Affairs, Health Services Research and Development. The funding organisation had no role in the design and conduct of the study; in the collection, analysis, and interpretation of the data; or in the preparation, review or approval of the manuscript. IIR 04-107.

  • Competing interests Alan Schwartz and Saul Weiner are owners of a company that provides management consulting services to healthcare providers and institutions interested in collecting customer service and performance data using methods employed in this study (unannounced standardised patients). Amy Binns-Calvey is a consultant to the company. They have not received any consulting fees, honorarium, contracts or other payments to date. The remaining authors have no relationships or activities that could appear to have influenced the submitted work.

  • Ethics approval The study was approved by the Institutional Review Board of the University of Illinois at Chicago and Jesse Brown VA Medical Center. It was also approved at IRB affiliates of all sites.

  • Provenance and peer review Not commissioned; externally peer reviewed.