Background National Hospital Quality Measures (NHQM) should accurately reflect quality of care, as they increasingly impact reimbursement and reputation. However, similar to risk adjustment of outcomes measures, NHQM process measures pose unique methodological concerns, including lack of representativeness of the final denominator population after exclusions. This study determines population size and characteristics for each acute myocardial infarction (AMI) measure, reasons for exclusion from the measures, and variation in exclusion rates among hospitals.
Methods and results 163 144 discharges from 172 University HealthSystem Consortium hospitals between 2008-Q4 and 2013-Q3 were examined, including characteristics and propensity scores of included and excluded groups. Measure exclusions ranged from 17.8% (discharge aspirin) to 90.1% (percutaneous coronary intervention, PCI, within 90 min), with substantial variation across hospitals. Median annual denominator size (IQR) for PCI within 90 min was 28 (20, 44) at major teaching hospitals, versus 10 (0, 25) at non-teaching hospitals. Patients most likely to be excluded (in the 10th vs 1st propensity decile) were older (mean age (SD) of 78.1 (10.8) vs 50.3 (8.6) years), more likely to have Medicare (90.5% vs 0.9%), had more documented comorbidities (15.6 (4.6) vs 6.2 (2.5) hierarchical clinical condition categories) and higher admission mortality risk (Major or Extreme 80.9% vs 7.3%, respectively), and experienced higher inpatient mortality (10.0% vs 1.6%).
Conclusions Exclusion from AMI measures varied substantially among hospitals, sample sizes were very small for some measures (PCI and ACE inhibitor measures) and measures often excluded high-risk populations. This has implications for the representativeness and comparability of the measures and provides insight for future measure development.
- Quality measurement
- Health policy
- Pay for performance
- Performance measures
Statistics from Altmetric.com
The National Hospital Quality Measures (NHQM) are a set of performance measures, mostly evidence-based processes of care, derived from the results of the Cooperative Cardiovascular Project. This project found that many patients were not receiving indicated care for acute myocardial infarction (AMI).1–5 In an attempt to incentivise hospitals to measure and improve care, these measures were ultimately adapted into reporting and pay-for-performance strategies, most significantly the Centers for Medicare and Medicaid Services (CMS) Value-Based Purchasing (VBP) programme.6–10 For almost a decade, these measures have been the primary indicators of the quality of cardiovascular care provided in US hospitals. Consequently providers invested substantial resources towards data collection, analysis and reporting of these measures.
There is an ongoing debate about the merits of process measures as opposed to outcome measures for accountability and reimbursement.10–12 Outcome measures are the ultimate validators of care quality, as noted by Professor Avedis Donabedian half a century ago. However, these measures require robust risk adjustment to fairly account for inherent differences in patient case mix among providers, and the adequacy of this adjustment has often been challenged.13 For example, important risk predictors are sometimes not available, and coding of comorbidities may vary across institutions.
Process measures have been viewed as a potential solution to the challenges of outcomes measures. They theoretically avoid the need for risk adjustment, as all patients deserve to receive all indicated care. These measures are also directly actionable, and results are immediately available. As process measures were more widely implemented, however, it became apparent that they also have troubling issues, some of which might limit their accuracy as measures of care quality (including, in some instances, a lack of correlation between process measure performance and outcomes).14 From a methodological perspective, they appear to have a potential weakness that is somewhat analogous to that of risk adjustment in outcome measures. Not all generally desirable care processes are applicable to all patients, and virtually all process measures exclude certain populations.
Performance scores are based on the denominator population remaining after exclusions, but this creates two concerns. First, the final denominator population for a measure may be only a small fraction of the total number of patients with a given condition that the hospital treated. Second, because hospitals care for different types of patients (eg, tertiary teaching hospitals vs community hospitals), some may have more or less exclusions compared with others, which will potentially impact their performance scores. Although exclusions are necessary for clinical appropriateness and to ensure a homogeneous measurement population for accurate comparisons between hospitals, there is concern that variability in exclusions among hospitals may limit the comparability of measurements. Additionally, some large and potentially important groups of patients may be excluded from the measurement population.15 Previous work has demonstrated that a significant percentage of Medicare beneficiaries are excluded from these measures, limiting their representativeness.16 Exclusions from the measure population were reported to increase during the period from 1994 through 2001.17 Most importantly, patients who were excluded from the measures had 1-year mortality rates double those of CMS-included patients.18 Despite these important findings, previous studies have not assessed exclusion rates across a large population of inpatients, nor have they had access to specific reasons for exclusion from the measurement population.
The aims of this study were to (1) define the exclusion rates from the original populations for each of the measures, and the size of the final denominator populations; (2) determine the reasons for exclusion from the denominator measure population; (3) describe the variation in denominator exclusion rate among hospitals; and (4) investigate those patient and hospital factors that influence exclusion from the measures, and the resulting differences in composition of included and excluded populations.
The University HealthSystem Consortium (UHC), a national organisation of academic medical centre members and affiliated hospitals, collects data to facilitate quality measurement and comparative analysis. UHC has 117 participating academic medical centres with 338 affiliated hospitals. The UHC clinical database/resource manager includes patient-level data with detailed demographics and diagnosis and procedure codes (International Classification of Diseases 9, ICD-9-CM, codes). A subset of participating hospitals use UHC to collect data for submission to the NHQM, and these hospitals were the basis for our study (195 total hospitals in our source dataset). For patients meeting eligibility for inclusion in the NHQM, the database also includes patient-level information regarding the measure population, reasons for denominator exclusion and measure performance.
We matched data from UHC with the American Hospital Association Annual Survey (AHA survey) for 2011.19 All but three hospitals in UHC were linked with the survey and these three hospitals were excluded from analysis. Teaching status was determined by whether the hospital had at least one training programme approved by the Accreditation Council for Graduate Medical Education, and whether the hospital was a member of the American Association of Medical Colleges Council of Teaching Hospitals (COTH). We classified hospitals into COTH teaching, non-COTH teaching and non-teaching hospitals.
Patient population and study period
Initial eligibility criteria for sampling in the AMI population required that patients were admitted for hospital inpatient care with an ICD-9-CM principal diagnosis code for AMI (410.x0/410.x1), were older than 18 years at admission and had length of stay less than 120 days.6 From the patient population that meets these initial criteria, hospitals sample a portion of eligible AMI discharges based on sample size guidelines published in the measure specifications. Each hospital has some flexibility to determine their actual sample sizes as long as they comply with the sample size guidelines; consequently, sampling strategies and sizes differ among hospitals. For certain AMI measures (specifically (AMI-8a), primary percutaneous coronary intervention (PCI) within 90 min) many hospitals choose to include the entire eligible population.6
Our final study population consisted of all patients included by hospitals in their initial AMI population over 5 years between Quarter 4 of 2008 (2008-Q4) and Quarter 3 of 2013 (2013-Q3).
The study period is covered by NHQM Measure Specification Manual Versions 2.5b through 4.2b. All measures except the AMI-10 statin at discharge measure were collected continuously during the study period; the AMI-10 measure was collected starting in 2010-Q4. This limitation is noted where applicable, specifically in the analysis of variation by hospital teaching status.
The three arrival measures (and their two derivatives) consist of aspirin at arrival (AMI-1) and time to primary PCI (median time, AMI-8; and therapy within 90 min of hospital arrival, AMI-8a). An exceedingly small group of patients were eligible for fibrinolysis and we do not report any data for this measure. The four discharge prescription measures include aspirin (AMI-2), ACE inhibitor (ACEi) or angiotensin receptor blocker (ARB) for patients with left ventricular systolic dysfunction (LVSD) (AMI-3), β-blocker (AMI-5) and statin (AMI-10).
Comorbidities and severity of illness
Comorbidities, severity of illness and risk of mortality were assessed. UHC generates severity and mortality scores at admission and discharge using the 3M All Patient Refined Diagnosis Related Group classification system for all inpatient encounters.20–22 This system uses ICD-9 codes to group patients into four categories based on risk: ‘Mild’, ‘Moderate’, ‘Major’ and ‘Extreme’.22 We also used the patient's ICD-9 codes to determine specific comorbidities at the time of admission based on CMS risk-adjusted 30-day mortality model.23–25 We used ‘Present-on-Admission’ flag to confirm whether a given comorbidity was present on admission and unlikely to be a complication developed during hospitalisation. We report data both for the number of total CMS Hierarchical Clinical Conditions (HCC) categories (out of 189 possible), as well as for a subset of 14 comorbidities used in the CMS AMI mortality model. We adjusted for the 14 comorbidities in the CMS AMI mortality model in our multivariable modelling (see below).
For our analyses, we established a number of study exclusions (separate from the NHQM criteria) related to potential coding and data quality issues (see online supplementary eTable 1 for detailed information). For example, hospitals were completely excluded from the study if 100% of their total discharges were in the excluded group, if they had fewer than 20 discharges during the entire study period, or if they could not be matched with the AHA survey. Specific quarters were excluded from a hospital's data if there were fewer than five discharges in that quarter (which is consistent with the measure specification), or if the hospital reported 100% of their discharges in the excluded group for that quarter. Eight individual patients were excluded because of inaccurate diagnosis coding from the submitting hospitals (principal diagnosis was not AMI on the ICD-9 codes specification table). A small group of patients in 2013-Q3 had multiple reasons listed for exclusion from the measures. We took the first reason for exclusion in the measure specification algorithm as the primary reason for exclusion for that patient.
Overall hospital inclusion/exclusion rates and original and final denominator population sizes were determined for each process measure. Patient and hospital characteristics between included and excluded population were compared using χ2 tests for categorical variables and t tests for continuous variables. Some analyses (exclusion variation by hospital status and detailed exclusion reasons) were limited to the period from 2010Q4 to 2013Q3 (after the addition of the AMI-10 measure in which the specification only had very minor change). We assessed unadjusted between-hospital variation in mean exclusion rate using coefficients of variation (CV, SD/Mean), where CV is the size of the SD relative to the mean, expressed as a decimal. Larger CV indicate having a larger SD compared with the mean. Adjusted between-hospital variation was assessed using the covariance parameter estimates from the generalised logistic mixed models, described below. We also determined whether there was a correlation between exclusion rate and performance on the measure using unadjusted Pearson correlation.
Propensity models were used to compare group differences on observed covariates, using multivariable logistic regression models. The goal of these analyses was to identify patients at high risk of exclusion from the measures. The propensity score analysis allowed us to summarise large numbers of covariates in a single score to determine whether characteristics differed significantly between the included and excluded groups, and to identify types of patients at low and high risk of exclusion. A propensity-score-stratified analysis comparing the 1st decile (least likely to be excluded) to the 10th decile (most likely to be excluded) of the propensity score distribution was performed. This allowed us to compare the association of patient and hospital characteristics with patient inclusion or exclusion.
Multivariable logistic mixed models with a hospital-specific random effect were used to account for within-hospital clustering of patient demographic and clinical factors between included and excluded groups, and between-hospital variation. The outcome variable was patient exclusion status (inclusion or exclusion) rather than a clinical outcome such as mortality. SAS GLIMMIX was used to estimate the log-odds of exclusion from each measure. Propensity scores were calculated for each patient, which are the predicted probability of being in the excluded group. Patients with similar propensity scores have comparable characteristics.
Covariates were selected based on the CMS AMI mortality model and additional factors that we believed might impact exclusion rates (race, ethnicity, insurance, severity of illness, weekend admission for arrival measures, weekend discharge and the use of intensive care unit for discharge measures). To maximise predictive accuracy, and given our large sample size, we employed a liberal approach to include as many available covariates as possible (an all-in approach).
All statistical analyses were performed using SAS software, V.9.4 (SAS Institute, Cary, North Carolina, USA). Selected graphics were produced with R (V.3.1.2), ggplot2 package (V.1.0.1) (R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2014).26
The study was approved by the institutional review board at Partners Healthcare, and given a waiver of informed consent.
The final eligible population for (AMI-1) through (AMI-8a) included 163 144 discharges (163 143 for AMI-8a) from 172 hospitals between 2008-Q4 and 2013-Q3. The study population for AMI-10 (statin at discharge) was 109 293 discharges from 163 hospitals between 2010-Q4 and 2013-Q3 (due to the later introduction of this measure). The majority of hospitals (61.6%) were COTH teaching hospitals, accounting for 83.3% of discharges. Among COTH hospitals, 96.1% were PCI-capable, compared with 50.1% and 53.7% of non-COTH teaching and non-teaching hospitals, respectively.
Final measure denominator size
Notably, the yearly measurement population denominator (eg, included patients) median size (IQR) for PCI within 90 min (AMI-8a) was only 28 (20, 44) at COTH hospitals, 12 (0, 29) at non-COTH teaching hospitals and 10 (0, 25) at non-teaching hospitals. The final denominator population sizes for ACEi/ARB for LVSD (AMI-3) were likewise small (medians (IQR) 49 (33, 73), 13 (8, 30) and 9 (3, 20) at COTH, non-COTH teaching and non-teaching hospitals, respectively). The median and IQRs of the final denominator population sizes by hospital type are presented in table 1.
Measure exclusion rates
PCI within 90 min (AMI-8a) had the highest rate of exclusion, 90.1%. Aspirin at discharge (AMI-2) had the lowest rate of exclusion, 17.8%. There were some noticeable patterns in exclusion. Among the 109 293 patients eligible for all six measures, 1782 (1.6%) were included in all six and 7263 (6.6%) were excluded from all six measures. The most frequent measurement pattern observed was inclusion in all measures except AMI-8a and AMI-3, and this occurred in 58 429 (53.5%) of patients. The remaining 41 819 (38.3%) of patients were included in at least one measure. Exclusion rates and performance on each measure are provided in online supplementary eTable 2, and patterns in exclusion from more than one measure are shown in online supplementary eFigure 1.27
Correlation with performance
Exclusion rates were negatively correlated with performance on the quality measures; lower rates of measure exclusions were associated with higher rates of performance on the measures. Correlation coefficients ranged from −0.33 for AMI-3 to −0.69 for AMI-8a (p<0.001 for both) (see online supplementary eTable 3).
Variation in exclusion rates
Interhospital variation in exclusions was substantial. The measure with the greatest variation was aspirin at arrival (AMI-1), with CV as high as 0.709, 0.726 and 0.626, for COTH, non-COTH, and non-teaching hospitals, respectively. Measures AMI-2, AMI-5 and AMI-10 had similarly high CV. The measure with the lowest CV was ACEi/ARB for LVSD (AMI-3), with CV of 0.051, 0.063 and 0.044, for COTH, non-COTH and non-teaching hospitals, respectively. Figure 1 illustrates hospital-specific exclusion rates stratified by teaching status, using a subset population from 2010 Q4 to 2013. Each line represents the exclusion rate for a specific hospital for that specific measure. See online supplementary eTable 4a for mean, SD and CV of exclusion rates for each measure by hospital type. Based on multivariable models using the full population, covariance parameter estimates and SDs are presented in online supplementary eTable 4b. There is still significant between-hospital variation in exclusion rate for all measures after adjusting for patient characteristics. For example, the odds of a patient being excluded from aspirin at arrival (AMI-1) measure when admitted at a hospital with 1 SD above the average exclusion rate is 5.12 times that when admitted to a hospital with 1 SD below the average.
Reasons for exclusion
Detailed reasons for exclusion from selected arrival and discharge measures are presented in figure 2. The single largest exclusion category consisted of 73 376 (67.1%) patients without LVSD who were not eligible for the ACEi/ARB at discharge for LVSD (AMI-3) measure. PCI within 90 min (AMI-8a) had two dominant reasons for exclusion: no documented ST-segment elevation (STEMI) or left bundle branch block (n=50 143 or 45.9%), and patient was transferred from another facility (n=41 610 or 38.1%).
Unadjusted measurement population differences
Most bivariate comparisons between included and excluded groups were statistically significant due to the large sample size (see online supplementary eTable 5 for complete unadjusted comparisons between the included and excluded groups for all measures). For the PCI within 90 min (AMI-8a) measure, the excluded patients were older (mean age (SD) 66.3 (14.2) vs 60.4 (12.5)), more likely to be discharged to another hospital (28.0% vs 12.4%) and had higher admission severity of illness and risk of mortality (Major or Extreme in 36.2% vs 21.5% for severity and 35.8% vs 17.6% for mortality).
Propensity score distributions
Model terms for each of the propensity models are presented in online supplementary eTable 6. Kernel plots of the log(PS) distributions are demonstrated in figure 3, allowing a visual estimation of the population distributions of propensity score in the included and excluded populations. The population propensity score differences are most notable for the PCI within 90 min (AMI-8a) and the discharge measures. The included and excluded populations were most similar for aspirin at arrival (AMI-1).
Patient characteristics in Decile 1 (low probability of exclusion) and Decile 10 (high probability of exclusion) of the propensity score distributions for PCI within 90 min and statin at discharge are presented in table 2. Additional patient-level characteristics in Decile 1 and Decile 10 of PS for the other measures are presented in online supplementary eTable 7. Compared with the bottom decile, patients in the top decile for exclusion from PCI within 90 min (AMI-8a) tended to be older (mean age of 78.1 (10.8) vs 50.3 (8.6) years), were more likely to have Medicare as a payer (90.5% vs 0.9%), were more likely to be transferred to another hospital (53.8% vs 6.9%), had more comorbidities (mean 15.6 (4.6) vs 6.2 (2.5) HCC categories), had higher admission severity and mortality scores (Major or Extreme in 80.9% vs 7.9% and 80.9% vs 7.3%, respectively) and had a higher mortality rate during their hospitalisation (10.0% vs 1.6%).
Our study findings highlight a number of issues impacting the usefulness of NHQM process measures for AMI as well as other process measures that have population exclusion and inclusion criteria. Although in some cases exclusions were for clinically appropriate reasons (ie, time to PCI and ACEi/ARB prescription), the rationale and appropriateness of other exclusion criteria (such as transfer exclusions) are more problematic. Excluded populations tended to include older patients on Medicare, with more comorbidities and more severe presentations.
Our data raise two primary concerns. First, high exclusion rates for some measures result in small measurement population denominators. Second, measures tend to exclude important high-risk populations (such as older patients and those with multiple comorbidities). With regard to the first concern, measures with small denominators may be unduly influenced at the margin by single patient encounters, and they may lack sufficient power to differentiate performance between hospitals. Assuming the median sample size of 28 patients at COTH hospitals for the AMI-8a measure, measure non-performance for a single patient will change the performance rate by 3.6%. Performance failure on more than two patients at this sample size would place the hospital below the 2013 VBP achievement threshold for the AMI-8a measure.28 The effect is even more pronounced at smaller hospitals with smaller measurement population sizes (despite the presence of strict sample size guidelines that relieve hospitals with small population sizes from the reporting requirement).
A second concern is that the final measure denominators excluded some important subpopulations of interest, including Medicare patients and those with high medical complexity and severity of illness. Consequently, while performance measures may reflect a hospital's ability to provide care for a typical subset of patients, they may not accurately characterise their effectiveness in caring for more complex subpopulations. For example, the AMI measures may not accurately reflect the care provided at academic medical centres, where treatment of the most complex patients, who require appropriate and timely care, is an important distinguishing factor.29 More frequent exclusion of Medicare patients from the denominator populations compounds the difficulty in accurately assessing and (through VBP programmes) improving the quality of care provided to this population.
Our study had several limitations. First, the data were derived from the UHC database, which includes primarily academic medical centres and their affiliates. Additionally, we were only able to obtain data from hospitals that use UHC to assist in submission of NHQM data to CMS. Although this limits the generalisability of the data to other groups of hospitals, UHC represents the single largest source of data available that allows determination of the reason for exclusion from the measures. Our analyses did not allow for accurate determination of patient presentation (ST-segment myocardial infarction vs non-ST segment myocardial infarction), which limited some analyses (particularly the (AMI-8a) measure).
Our study is the first to examine specific reasons for exclusion from the measurement populations in a large, nationwide sample of patients, and among contemporary measures included in VBP. These findings impact the interpretation of the AMI measures for single hospitals and weaken the ability to make meaningful comparisons between hospitals, and have important implications for design of future measures for clinical conditions and pay-for-performance strategies. Our data highlight the need for transition to risk-adjusted outcomes measures with broad inclusion criteria that can more fully account for care provided to complex patients. Although AMI process measures have been retired from the VBP programme and replaced with risk-adjusted outcome measures for mortality and readmissions, newer process measure sets for stroke and venous thromboembolism have been introduced.30 These newer measure sets are largely similar in structure to earlier AMI process measures and rely on a similar framework.
A number of potential refinements to process-measure-based programmes could help mitigate the problems with small sample sizes and differences in representativeness between the included and excluded groups. Small sample sizes could be improved by increasing the minimum sample size requirement (currently 311 patients per quarter for hospitals with ≥1551 initial cases), or using a longer measurement period. Multicomponent composite measures might also increase effective sample size and provide more comprehensive assessment of quality. Measures might be developed that stratify performance for lower-risk patients versus more complex patients, rather than excluding the latter, as is currently done. Finally, requiring that hospitals publish the denominator exclusions and demographics of the initial measurement population and the final denominator population would provide external stakeholders a much better understanding of these measures, including their limitations.
Improving the quality of care provided to patients with AMI is a noble and important goal. To make real gains in quality, it is crucial that measurement is accurate and responsive to change, and that measures are representative of the populations of patients treated at various hospitals. Progressive refinement of measurement systems is critical, with the goal of ensuring relative homogeneity of the study cohorts while also achieving adequate sample size and representativeness. These considerations will further contribute to advancing the quality of care for patients with AMI, and the lessons learned from these measures are applicable to many other quality measurement programmes.
Contributors All authors contributed to the research design and drafting of the manuscript. Authors JB and DMS had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Authors JB and XL conducted and are responsible for the data analysis.
Competing interests None declared.
Ethics approval Partners Healthcare IRB.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.