Original articleLikelihood ratios with confidence: Sample size estimation for diagnostic test studies
Abstract
Confidence intervals are important summary measures that provide useful information from clinical investigations, especially when comparing data from different populations or sites. Studies of a diagnostic test should include both point estimates and confidence intervals for the tests' sensitivity and specificity. Equally important measures of a test's efficiency are likelihood ratios at each test outcome level. We present a method for calculating likelihood ratio confidence intervals for tests that have positive or negative results, tests with non-positive/non-negative results, and tests reported on an ordinal outcome scale. In addition, we demonstrate a sample size estimation procedure for diagnostic test studies based on the desired likelihood ratio confidence interval. The renewed interest in confidence intervals in the medical literature is important, and should be extended to studies analyzing diagnostic tests.
References (7)
- L.E. Braitman
Confidence intervals extract clinically useful information from data
Ann Intern Med
(1988) - D.L. Sackett et al.
Clinical Epidemiology: a Basic Science for Clinical Medicine
(1985)
Cited by (836)
Performance of the European Society of Cardiology 0/1-hour algorithm with high-sensitivity cardiac troponin T at 90 days among patients with known coronary artery disease
2024, American Journal of Emergency MedicineThe European Society of Cardiology (ESC) 0/1-h high sensitivity troponin T (hs-cTnT) algorithm does not differentiate risk based on known coronary artery disease (CAD: prior myocardial infarction [MI], coronary revascularization, or ≥ 70% coronary stenosis). We recently evaluated its performance among patients with known CAD at 30-days, but little is known about its longer-term risk prediction. The objective of this study is to determine and compare the performance of the algorithm at 90-days among patients with and without known CAD.
We performed a pre-planned subgroup analysis of the STOP-CP cohort, which prospectively enrolled ED patients ≥21 years old with symptoms suggestive of ACS without ST-elevation on initial ECG across 8 US sites (1/25/2017–9/6/2018). Participants with 0- and 1-h hs-cTnT measures (Roche, Basel, Switzerland) were stratified into rule-out, observe, and rule-in groups using the ESC 0/1-h algorithm. Algorithm performance was tested among patients with or without known CAD, as determined by the treating provider. The primary outcome was cardiac death or MI at 90-days. Fisher's exact tests were used to compare 90-day event and rule-out rates between patients with and without known CAD. Negative predictive values (NPVs) for 90-day cardiac death or MI with exact 95% confidence intervals were calculated and compared using Fisher's exact test.
The STOP-CP study accrued 1430 patients, of which 31.4% (449/1430) had known CAD. Cardiac death or MI at 90 days was more common in patients with known CAD than in those without [21.2% (95/449) vs. 10.0% (98/981); p < 0.001]. Using the ESC 0/1-h algorithm, 39.6% (178/449) of patients with known CAD and 66.1% (648/981) of patients without known CAD were ruled-out (p < 0.001). Among rule-out patients, 90-day cardiac death or MI occurred in 3.4% (6/178) of patients with known CAD and 1.2% (8/648) without known CAD (p = 0.09). NPV for 90-day cardiac death or MI was 96.6% (95%CI 92.8–98.8) among patients with known CAD and 98.8% (95%CI 97.6–99.5) in patients without known CAD (p = 0.09).
Patients with known CAD who were ruled-out using the ESC 0/1-h hs-cTnT algorithm had a high rate of missed 90-day cardiac events, suggesting that the ESC 0/1-h hs-cTnT algorithm may not be safe for use among patients with known CAD.
High-Sensitivity Cardiac Troponin T to Optimize Chest Pain Risk Stratification (STOP-CP; ClinicalTrials.gov: NCT02984436; https://clinicaltrials.gov/ct2/show/NCT02984436).
Validation of the ACC Expert Consensus Decision Pathway for Patients With Chest Pain
2024, Journal of the American College of CardiologyThe American College of Cardiology (ACC) recently published an Expert Consensus Decision Pathway for chest pain.
The purpose of this study was to validate the ACC Pathway in a multisite U.S. cohort.
An observational cohort study of adults with possible acute coronary syndrome was conducted. Patients were accrued from 5 U.S. Emergency Departments (November 1, 2020, to July 31, 2022). ECGs and 0- and 2-hour high-sensitivity troponin (Beckman Coulter) measures were used to stratify patients according to the ACC Pathway. The primary safety outcome was 30-day all-cause death or myocardial infarction (MI). Efficacy was defined as the proportion stratified to the rule-out zone. Negative predictive value for 30-day death or MI was assessed among the whole cohort and in a subgroup of patients with coronary artery disease (CAD) (prior MI, revascularization, or ≥70% coronary stenosis).
ACC Pathway assessments were complete in 14,395 patients, of whom 51.7% (7,437 of 14,395) were women with a median age of 56 years (Q1-Q3: 44-68 years). Known CAD was present in 23.5% (3,386 of 14,395) and 30-day death or MI occurred in 8.1% (1,168 of 14,395). The ACC Pathway had an efficacy of 48.1% (95% CI: 47.3%-49.0%). Among patients in the rule-out zone, 0.3% (22 of 6,930) had death or MI at 30 days, yielding a negative predictive value of 99.7% (95% CI: 99.5%-99.8%). In patients with known CAD, 20.0% (676 of 3,386) were classified to the rule-out zone, of whom 1.5% (10 of 676) had death or MI.
The ACC expert consensus decision pathway was safe and efficacious. However, it may not be safe for use among patients with known CAD.
Short-term Detection of Fast Progressors in Glaucoma: The Fast Progression Assessment through Clustered Evaluation (Fast-PACE) Study
2024, OphthalmologyTo evaluate the performance of an intensive, clustered testing approach in identifying eyes with rapid glaucoma progression over 6 months in the Fast Progression Assessment through Clustered Evaluation (Fast-PACE) Study.
Prospective cohort study.
A total of 125 eyes from 65 primary open-angle glaucoma (POAG) subjects.
Subjects underwent 2 sets of 5 weekly visits (clusters) separated by an average of 6 months and then were followed with single visits every 6 months for an overall mean follow-up of 25 months (mean of 17 tests). Each visit consisted of testing with standard automated perimetry (SAP) 24-2 and 10-2, and spectral-domain OCT (SD-OCT). Progression was assessed using trend analyses of SAP mean deviation (MD) and retinal nerve fiber layer (RNFL) thickness. Generalized estimating equations were applied to adjust for correlations between eyes for confidence interval (CI) estimation and hypothesis testing.
Diagnostic accuracy of the 6-month clustering period to identify progression detected during the overall follow-up.
A total of 19 of 125 eyes (15%, CI, 9%–24%) progressed based on SAP 24-2 MD over the 6-month clustering period. A total of 14 eyes (11%, CI, 6%–20%) progressed on SAP 10-2 MD, and 16 eyes (13%, CI, 8%–21%) progressed by RNFL thickness, with 30 of 125 eyes (24%, CI, 16%–34%) progressing by function, structure, or both. Of the 35 eyes progressing during the overall follow-up, 25 had progressed during the 6-month clustering period, for a sensitivity of 71% (CI, 53%–85%). Of the 90 eyes that did not progress during the overall follow-up, 85 also did not progress during the 6-month period, for a specificity of 94% (CI, 88%–98%). Of the 14 eyes considered fast progressors by SAP 24-2, SAP 10-2, or SD-OCT during the overall follow-up, 13 were identified as progressing during the 6-month cluster period, for a sensitivity of 93% (CI, 66%–100%) for identifying fast progression with a specificity of 85% (CI, 77%–90%).
Clustered testing in the Fast-PACE Study detected fast-progressing glaucoma eyes over 6 months. The methodology could be applied in clinical trials investigating interventions to slow glaucoma progression and may be of value for short-term assessment of high-risk subjects.
Proprietary or commercial disclosure may be found after the references in the Footnotes and Disclosures at the end of this article.
Validity and Accuracy of Step Count as an Indicator of a Sedentary Lifestyle in People With Chronic Obstructive Pulmonary Disease
2023, Archives of Physical Medicine and RehabilitationTo determine the validity and accuracy of <5000 steps/day as a sedentary lifestyle indicator, and the optimal step count cut point value for indicating a sedentary lifestyle in people with chronic obstructive pulmonary disease (COPD).
Analysis of baseline data from a randomized clinical trial.
Sydney, Australia.
Stable COPD on the waitlist for pulmonary rehabilitation.
Not applicable.
Step count and time in sedentary behavior (SB) were assessed using thigh-worn accelerometry. A sedentary lifestyle was defined as <5000 steps/day. Pearson correlation coefficients were analyzed between step count and time spent in SB. Sensitivity, specificity, and accuracy were calculated for the <5000 steps/day threshold. Receiver operating characteristic curves with the area under the curve were computed for step count in identifying a sedentary lifestyle.
69 people with COPD (mean age=74 years, SD=9; forced expiratory volume in 1 second, mean=55%, SD=19 predicted) had sufficient wear data for analysis. There was a moderate inverse correlation between step count and time spent in SB (r=−0.58, P<.001). Step count had a fair discriminative ability for identifying a sedentary lifestyle (area under the curve=0.80, 95% confidence interval [CI], 0.68-0.91). The <5000 steps/day threshold had a sensitivity, specificity, and accuracy of 82% (95% CI, 70-94), 70% (95% CI, 54-86), and 78%, respectively. A lower threshold of <4300 steps/day was more accurate for ruling in a sedentary lifestyle.
Compared with thigh-worn accelerometry, <5000 steps/day is a valid and reasonably accurate indicator of a sedentary lifestyle in this population.
Multiparametric Quantitative Imaging Biomarkers for Phenotype Classification: A Framework for Development and Validation
2023, Academic RadiologyThis manuscript is the third in a five-part series related to statistical assessment methodology for technical performance of multi-parametric quantitative imaging biomarkers (mp-QIBs). We outline approaches and statistical methodologies for developing and evaluating a phenotype classification model from a set of multiparametric QIBs. We then describe validation studies of the classifier for precision, diagnostic accuracy, and interchangeability with a comparator classifier. We follow with an end-to-end real-world example of development and validation of a classifier for atherosclerotic plaque phenotypes. We consider diagnostic accuracy and interchangeability to be clinically meaningful claims for a phenotype classification model informed by mp-QIB inputs, aiming to provide tools to demonstrate agreement between imaging-derived characteristics and clinically established phenotypes. Understanding that we are working in an evolving field, we close our manuscript with an acknowledgement of existing challenges and a discussion of where additional work is needed. In particular, we discuss the challenges involved with technical performance and analytical validation of mp-QIBs. We intend for this manuscript to further advance the robust and promising science of multiparametric biomarker development.
A wearable patch based remote early warning score (REWS) in major abdominal cancer surgery patients
2023, European Journal of Surgical OncologyThe shift toward remote patient monitoring methods to detect clinical deterioration requires testing of wearable devices in real-life clinical settings. This study aimed to develop a remote early warning scoring (REWS) system based on continuous measurements using a wearable device, and compare its diagnostic performance for the detection of deterioration to the diagnostic performance of the conventional modified early warning score (MEWS).
The study population of this prospective, single center trial consisted of patients who underwent major abdominal cancer surgery and were monitored using routine in-hospital spotcheck measurements of the vital parameters. Heart and respiratory rates were measured continuously using a wireless accelerometer patch (HealthDot). The prediction by MEWS of deterioration toward a complication graded Clavien-Dindo of 2 or higher was compared to the REWS derived from continuous measurements by the wearable patch.
A total of 103 patients and 1909 spot-check measurements were included in the analysis. Postoperative deterioration was observed in 29 patients. For both EWS systems, the sensitivity (MEWS: 0.20 95% CI: [0.13–0.29], REWS: 0.20 95% CI: [0.13–0.29]) and specificity (MEWS: 0.96 95% CI: [0.95–0.97], REWS: 0.96 95% CI: [0.95–0.97]) were assessed.
The diagnostic value of the REWS method, based on continuous measurements of the heart and respiratory rates, is comparable to that of the MEWS in patients following major abdominal cancer surgery. The wearable patch could detect the same amount of deteriorations, without requiring manual spot check measurements.