A Hierarchical Outcomes Approach to Test Assessment,☆☆,

https://doi.org/10.1016/S0196-0644(99)70421-XGet rights and content

Abstract

This report describes a hierarchical classification system for clinical endpoints in diagnostic technology assessment. The model requires classification of investigative outcomes into 1 of 6 categories, including technical efficacy, diagnostic accuracy efficacy, diagnostic thinking efficacy, therapeutic efficacy, clinical outcome efficacy, and societal efficacy. Evaluations at successive levels indicate a broader understanding of the test’s value. The purpose of this classification system is to help readers of the medical literature consider various aspects of test performance, and to identify aspects of a test that require additional investigation. It is also designed to be used as a template for systematic reviews of diagnostic tests. In this review, examples from the medical literature are discussed to highlight important concepts from the hierarchy model and their relevance to emergency medicine.

[Pearl WS: A hierarchical outcomes approach to test assessment. Ann Emerg Med January 1999;33:77-84.]

Section snippets

INTRODUCTION

Many authors and readers focus on sensitivity and specificity to assess the merits of a diagnostic test. Although these familiar parameters are important to consider, they ignore other aspects of a test’s value. In this review a classification system for diagnostic technology assessments, reported by Fryback and Thornbury1 in 1991, is used as the basis for a discussion of other ways of evaluating diagnostic tests, their efficacy, and their ultimate value in emergency medicine.

The model used

TECHNICAL EFFICACY

Technical efficacy refers to a test’s ability to produce usable information.1, 2 This includes factors affecting test interpretation and implementation. The concept can be elaborated to include consideration of several related issues.2, 3, 4, 5

A recently developed whole blood rapid troponin T (TT) assay, the cardiospecific troponin T immunoassay (cTnT), serves as an example where technical efficacy is relevant to diagnostic testing in emergency medicine.6, 7, 8 The test is performed at the

DIAGNOSTIC ACCURACY EFFICACY

This category includes measurements of a test’s ability to detect or exclude disease compared with a criterion standard. In addition to sensitivity, specificity and predictive values (Figure 2), likelihood ratios and receiver operating characteristic (ROC) curves18 are other important measurements to consider.

  • Sensitivity (true-positive rate)=True positives/(True positives+false negatives)

  • Specificity (true-negative rate)=True negatives/(True negatives+ false positives)

  • Positive predictive

DIAGNOSTIC THINKING EFFICACY

This level of analysis is concerned with assessment of the effect of test information on diagnostic reasoning and disease categorization.1 Studies of diagnostic thinking serve as a proxy for estimating the effect of a test on patient care.

Diagnostic thinking assessments are based on measuring clinicians’ subjective impression of disease status. Most designs require clinicians to prospectively record their clinical diagnosis or differential diagnosis, to report their subjective pretest

THERAPEUTIC EFFICACY

Investigations at this important level seek to determine the effect of testing on patient management. Unfortunately, these are rarely performed before a test diffuses into general practice.2, 22 After widespread diffusion of the test, it is difficult to conduct a randomized, controlled trial of the test. Therefore investigators often resort to prospective case-series designs that assess management plans before and after testing, as demonstrated by the following example.

Weissman et al29

CLINICAL OUTCOME EFFICACY

The value of a therapeutic intervention lies in its ability to improve outcome or to provide an equivalent outcome at reduced cost. The same should be true of a diagnostic test. However, in most situations clinical outcome is temporally remote from testing, and the relationship between the two is not always apparent.30, 31 Consequently, there are a paucity of outcome level studies of diagnostic tests.30 Decision analysis provides an alternative for estimating the effects of diagnostic testing

SOCIETAL EFFICACY

Studies on the societal value of diagnostic tests conducted in the ED have not been reported. In this category the emphasis would shift from measuring benefit accrued to patients, to the benefit provided for society as a whole.1, 39

The methods for conducting this type of study are complex. All direct and indirect costs and benefits associated with the diagnostic process must be included. An assessment of societal benefit includes considerations such as resource utilization, worker productivity,

DISCUSSION

There is a growing awareness of the need for more appropriate utilization of diagnostic tests.3, 12, 43 Although the sensitivity and specificity of a test are frequently known, there are few hard data demonstrating the clinical efficacy of many tests. This is particularly true for screening tests in the ambulatory setting and “routine” admission tests.3

In response to this awareness, several disciplines have emerged, including medical technology assessment. Medical technology assessment refers

References (49)

  • F Mach et al.

    Rapid bedside whole blood cardiospecific troponin T immunoassay for the diagnosis of acute myocardial infarction

    Am J Cardiol

    (1995)
  • IH Jafri et al.

    Evaluation of the clinical impact of endoscopic ultrasonography in gastrointestinal disease

    Gastrointest Endosc

    (1996)
  • DG Fryback et al.

    The efficacy of diagnostic imaging

    Med Decis Making

    (1991)
  • DL Kent et al.

    Disease, level of impact, and quality of research methods: Three dimensions of clinical efficacy assessment applied to magnetic resonance imaging

    Invest Radiol

    (1992)
  • MD Silverstein et al.

    Conceptual framework for evaluating laboratory tests: Case-finding in ambulatory patients

    Clin Chem

    (1994)
  • The Working Group Methods for Prognosis and Decision Making

    memorandum for the evaluation of diagnostic measures

    J Clin Chem Clin Biochem

    (1990)
  • YT van der Schouw et al.

    Guidelines for the assessment of new diagnostic tests

    Invest Radiol

    (1995)
  • M Muller-Bardorff et al.

    Development and characterization of a rapid assay for bedside determinations of cardiac troponin T

    Circulation

    (1995)
  • PO Collinson et al.

    Multicentre evaluation of an immunological rapid test for the detection of troponin T in whole blood samples

    Eur J Clin Chem Clin Biochem

    (1996)
  • DA Noe et al.

    Laboratory Medicine: The Selection and Interpretation of Clinical Laboratory Studies

  • W Rottbauer et al.

    Troponin T: A diagnostic marker for myocardial infarction and minor cardiac cell damage

    Eur Heart J

    (1996)
  • EM Ohman et al.

    Cardiac troponin T levels for risk stratification in acute myocardial ischemia

    N Engl J Med

    (1996)
  • MC Reid et al.

    Use of methodological standards in diagnostic test research–Getting better but still not good

    JAMA

    (1995)
  • JB Henry

    Clinical Diagnosis and Management by Larboratory Methods

  • DL Streiner

    Learning how to differ: Agreement and reliability statistics in psychiatry

    Can J Psychiatry

    (1995)
  • DA Noe

    The Logic of Laboratory Medicine

  • DJ Karras

    Statistical nethodology: II. Reliability and validity assessment in study design, part A

    Acad Emerg Med

    (1997)
  • JM Bland et al.

    Statistical methods for assessing agreement between two methods of clinical measurement

    Lancet

    (1986)
  • HC Sox et al.

    Medical Decision Making, Newton

  • JF Tucker et al.

    Early diagnostic efficiency of cardiac troponin I and troponin T for acute myocardial infarction

    Acad Emerg Med

    (1997)
  • LB Lusted

    The clearing “haze”: A view from my window

    Med Decis Making

    (1991)
  • TH Lee et al.

    Ruling out acute myocardial infarction—A prospective multicenter validation of a 12-hour strategy for patients at low risk

    N Engl J Med

    (1991)
  • HD Royal

    Technology assessment: Scientific challenges

    AJR Am J Roentgenol

    (1994)
  • JA Hanley et al.

    A method of comparing the areas under receiver operating characteristic curves derived from the same cases

    Radiology

    (1983)
  • Cited by (37)

    • Methodology of method comparison studies evaluating the validity of cardiac output monitors: A stepwise approach and checklist

      2016, British Journal of Anaesthesia
      Citation Excerpt :

      This is important because the performance of CO monitors may differ considerably depending on (patho)physiological conditions in the patient.1 2 Moreover, method comparison research represents only the initial part of the validation process of new CO monitors.30 Besides technical efficacy, the ultimate goal of any newly developed monitor is to improve patient outcome and to be cost-effective.

    • Is heart rate variability better than routine vital signs for prehospital identification of major hemorrhage?

      2015, American Journal of Emergency Medicine
      Citation Excerpt :

      The second implication relates to research methodology. By way of background, Pearl [43] described a 7-tier hierarchical approach to evaluating diagnostic testing. The type of analysis in the current report—directly comparing HRV to routine vital signs—corresponds to Pearl's third tier “diagnostic thinking efficacy,” which includes the “percentage of cases in which the final diagnosis changed after testing.”

    • The six-item screener and AD8 for the detection of cognitive impairment in geriatric emergency department patients

      2011, Annals of Emergency Medicine
      Citation Excerpt :

      This is precisely why such instruments need to be validated within the environments in which they could be used.60,61 Diagnostic tests are therefore subjected to a hierarchic outcomes approach progressing from technical value to diagnostic accuracy to clinical outcome efficacy and societal efficacy.62 Because the SIS evaluates only 2 domains (recall, orientation) of cognitive dysfunction, future assessment of screening tools in the ED should evaluate different or additional domains.

    • Emergency Ultrasound Guidelines

      2009, Annals of Emergency Medicine
    • Managing Laboratory Test Use: Principles and Tools

      2007, Clinics in Laboratory Medicine
      Citation Excerpt :

      In particular, decisions about evaluation methodology and marketing both have important downstream effects on the use of the tests they develop. Fryback and Thornbury [1] have proposed a useful hierarchy, which illustrates how the use of a test is dependent on a number of factors beyond the technology embedded in the test (Box 1) [2]. They also use this hierarchy to illustrate how efficacy at a particular level is generally dependent on efficacy at all lower levels, and does not guarantee efficacy at any higher levels.

    View all citing articles on Scopus

    Address for reprints: William Pearl, MD, Department of Surgery, Division of Emergency Medicine, Emory University School of Medicine, 69 Butler Street SE, Atlanta, GA 30303; 404-616-4620, fax 404-659-6012.

    ☆☆

    0196-0644/99/$8.00 + 0

    47/1/94610

    View full text