Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Influence of Using Different Databases and ‘Look Back’ Intervals to Define Comorbidity Profiles for Patients with Newly Diagnosed Hypertension: Implications for Health Services Researchers

  • Guanmin Chen ,

    guchen@ucalgary.ca

    Affiliations Department of Community Health Sciences, University of Calgary, Calgary, Alberta, Canada, Institute of Public Health, University of Calgary, Calgary, Alberta, Canada, Research Facilitation, Alberta Health Services, Calgary, Alberta, Canada

  • Lisa Lix,

    Affiliation Department of Community Health Sciences, University of Manitoba, Manitoba, Canada

  • Karen Tu,

    Affiliation Department of Family and Community Medicine, University of Toronto/Institute for Clinical Evaluative Sciences (ICES), Toronto, Ontario, Canada

  • Brenda R. Hemmelgarn,

    Affiliations Department of Community Health Sciences, University of Calgary, Calgary, Alberta, Canada, Institute of Public Health, University of Calgary, Calgary, Alberta, Canada, Department of Medicine, University of Calgary, Calgary, Alberta, Canada

  • Norm R. C. Campbell,

    Affiliations Department of Community Health Sciences, University of Calgary, Calgary, Alberta, Canada, Department of Medicine, University of Calgary, Calgary, Alberta, Canada, Department of Pharmacology and Therapeutics, University of Calgary, Calgary, Alberta, Canada

  • Finlay A. McAlister,

    Affiliation Division of General Internal Medicine, University of Alberta, Edmonton, Ontario, Canada

  • Hude Quan,

    Affiliations Department of Community Health Sciences, University of Calgary, Calgary, Alberta, Canada, Institute of Public Health, University of Calgary, Calgary, Alberta, Canada

  • Hypertension Outcome and Surveillance Team

    Complete membership of author group can be found in the Acknowledgments.

Abstract

Objective

To determine the data sources and ‘look back’ intervals to define comorbidities.

Data Sources

Hospital discharge abstracts database (DAD), physician claims, population registry and death registry from April 1, 1994 to March 31, 2010 in Alberta, Canada.

Study Design

Newly-diagnosed hypertension cases from 1997 to 2008 fiscal years were identified and followed up to 12 years. We defined comorbidities using data sources and duration of retrospective observation (6 months, 1 year, 2 years, and 3 years). The C-statistics for logistic regression and concordance index (CI) for Cox model of mortality and cardiovascular disease hospitalization were used to evaluate discrimination performance for each approach of defining comorbidities.

Principal Findings

The comorbidities prevalence became higher with a longer duration. Using DAD alone underestimated the prevalence by about 75%, compared to using both DAD and physician claims. The C-statistic and CI were highest when both DAD and physician claims were used, and model performance improved when observation duration increased from 6 months to one year or longer.

Conclusion

The comorbidities prevalence is greatly impacted by the data source and duration of retrospective observation. A combination of DAD and physicians claims with at least one year observation duration improves predictions for cardiovascular disease and one-year mortality outcome model performance.

Introduction

Rigorous outcome research is required to adjust for comorbidities, as failing to adjust for comorbidities may raise questions for results and lead to erroneous conclusions[1]. To measure comorbidities, previous studies have used various data sources, including hospitalization discharge abstract database (DAD)[27], physician claims[813], and drug dispensations database[14,15] and different durations of retrospective observation. DAD is often used to measure Charlson comorbidities and evaluate their association with mortality, length of stay, and health care costs [1619]. Data from DAD, however, only records comorbidities for hospitalized patients, which is problematic since many patients with chronic conditions are managed at outpatient settings. As such, comorbidities that are defined using only one database are likely to be underestimated.

Researchers have tried various approaches to accurately defining comorbidities accurately. Wang et al. [12] developed strategies for defining comorbidities by using Medicare and Medicaid claims data. Researchers in the United States[20] and Australia [21] explored the length of “look back” required for defining comorbidities and their associations with clinical outcomes, as previous studies have indicated that the prevalence of comorbidities varies depending on the data source and length of observation period used, which impact adjusted clinical outcomes.

In our study of the occurrence, management, and outcomes related to hypertension, we have found that the majority of patients with hypertension are identified through physician claims data, while patients with severe hypertension are mainly identified from DAD data [22]. Considering the long incubation period from hypertension diagnosis to the manifestation of poor clinical outcomes, along with the number of patients who are managed in outpatient settings, we aimed to maximize the length of follow-up for outcomes and minimize the duration of observation for defining comorbidities by fully using available health information. Unfortunately, to the best of our knowledge, no existing studies have compared different data sources and durations for estimating the burden of comorbidities, and the impact these approaches have on the model performance of risk adjusted outcomes among patients with hypertension. Therefore, we conducted this study to define Charlson comorbidities using DAD and physician claims data for four durations of retrospective observation (i.e., 6 months, 1 year, 2 years, and 3 years) to explore the impact of these different approaches on mortality and cardiovascular disease outcomes, among patients with newly diagnosed hypertension.

Study Population and Method

Data Sources

We linked data from DAD, physician claims, population registry, and death registry in the province of Alberta, Canada from April 1, 1994 to March 31, 2010 (i.e., fiscal year) using unique personal health numbers. Data from the DAD includes all inpatients in Alberta and contains up to 16 diagnoses coded according to the International Classification of Diseases, 9th revision, Clinical Modification (ICD-9-CM) prior to April 1, 2002, and up to 25 diagnoses coded according to the ICD-10 Canadian Modification (ICD-10-CA) since April 1, 2002. In Alberta, physicians submit billing claims for services to the provincial Government insurance program, regardless of their service location. When submitting these claims, at least one and up to three diagnoses, coded in ICD-9, must be provided. Physician claims data captures clinical information from patients at emergency departments, hospitals, and outpatient clinics who are covered by the Alberta provincial insurance program. Due to this universal insurance program, the program registry (also called the population registry) covers nearly all Alberta residents and contains information such as personal health number, age, sex and postal code. The death registry is updated regularly and includes an individual’s date and location of death.

Study Population and Outcomes

We extracted patients with hypertension from our linked administrative data sources using the following ICD algorithm, which has previously been validated: “two claims within 2 years or 1 hospitalization” (sensitivity 75%, specificity 94%, positive predictive value 81%, and negative predictive value 92%) [23]. Patients with pregnancy-induced hypertension were excluded [23]. To determine newly-diagnosed (incidence) cases of hypertension, we employed a 3-year washout period so not to misclassify prevalent cases as incidence. We assigned the index date for hypertension diagnosis using the first date of physician visit or hospitalization with a hypertension diagnosis code. To ensure at least a one year follow up period for the outcomes among patients with hypertension, we included incidence cases for the fiscal years 1997 to 2008, resulting in up to a 12 year follow-up period for the study population. We excluded patients with hypertension who were not residents of Alberta or who were less than 20 years of age at the time of diagnosis.

Outcomes included all-cause mortality, determined from death registry data, and cardiovascular disease (CVD), and defined as either myocardial infarction, heart failure, or stroke. We linked the study population with data from DAD and used validated coding algorithms to define myocardial infarction (ICD-9: 410.x, 412.x; ICD-10-CA: I21.x, I22.x, I25.2), heart failure (ICD-9: 428.x; ICD-10-CA: I09.9, I11.0, I13.0, I13.2, I25.5, I42.0, I42.5-I42.9, I43.x, I50.x, P29.0), and stroke (ICD-9: 362.3, 430.x, 431.x, 433.x-436.x, excluding 433.x0 and 434.x0; ICD-10-CA: H34.1, I60.x, I61.x, I63.x, I64.x, G45.x in any diagnosis field) [2426]. Survival time was determined using the date of hypertension diagnosis and date of death/admission for cardiovascular disease. Patients were excluded if they moved out of province or reached the end of the observation period of March 31, 2010.

Comorbidity Definitions

Charlson comorbidities were defined using validated ICD-9 and ICD-10 coding algorithms [26]. We applied these coding algorithms to our three data sources (i.e., DAD, physician claims, and both) across four retrospective periods of observation (i.e., 6 months, 1 year, 2 years, and 3 years from the date of hypertension diagnosis). Thus, we evaluated 12 approaches to defining Charlson comorbidities. We did not use Elixhauser comorbidities that contain more conditions and are better predictors of long-term mortality than Charlson comorbidities.[27] The reason is that majority of hypertension patients are captured from physician claims databases. Diagnosis code in this database is coded using ICD-9, up to 4 digits. Defining Elixhauser comorbidities requires ICD-9-CM diagnosis codes, up to 5 digits (more precise coding system than ICD-9).0

Statistical Methods

The prevalence of Charlson comorbidities was calculated for each of our 12 approaches. We also employed logistic regression models for one year all-cause mortality and CVD hospitalization for each of these 12 approaches. Age and sex-adjusted odds ratio (OR) was estimated for each comorbidity. We then used the Cox proportional hazard regression model for all-cause mortality and CVD hospitalization and estimated the hazard ratio (HR) for each comorbidity after adjusting for age and sex.

We assessed our model performance by using C-statistics for logistic regression and concordance index CI for Cox proportional hazard regression [28]. We used 10-fold cross validation method to evaluate the model performances. The C-statistics, CI and their 95% confidence intervals were estimated using bootstrap method with 500 resamples. All analyses were conducted using SAS version 9.4 (SAS Institute Inc., USA).

This study was approved by The Conjoint Health Research Ethics Board (CHREB), University of Calgary. The waiver of consent was also approved by CHREB because this study analyzed the health administrative data, and patient records/information was anonymized and de-identified in these databases prior to analysis, approved number: REB13-0051.

Results

Of the 759,040 patients identified with hypertension between the 1994 and 2009 fiscal years, we included 456,263 patients with newly diagnosed hypertension. As shown in Table 1, 9.9% of these were identified using data only from DAD, 86.8% using data only from physician claims data, and 3.4% using both DAD and claims between 1997 and 2008. The follow-up period ranged from 0 to 12 years (mean: 5.7 years, median: 5.5 years) with a mortality rate of 2.8 per 1000 person-years.

thumbnail
Table 1. Characteristics of patients with newly diagnosed hypertension.

https://doi.org/10.1371/journal.pone.0162074.t001

The prevalence of each comorbidity was higher for when both DAD and physician claims data was used, compared to when either of these sources was used alone (Table 2). For the 1 year ‘look back’ period, the prevalence of having at least one Charlson comorbidity was almost twice as high in claims data than in DAD data, and was even higher when both DAD and claims data sources were used together (DAD: 15.5%, claims: 30.0%, and both: 32.6%). The prevalence also increased alongside an increased length of retrospective observation, although the increase from 2 to 3 years was less than the increase from 1 year to 2 years.

thumbnail
Table 2. Prevalence (%) of comorbidities among patients with newly diagnosed hypertension by data source and duration of retrospective observation.

https://doi.org/10.1371/journal.pone.0162074.t002

Risk-adjusted ORs and HRs for the Charlson comorbidities varied slightly across data sources and retrospective periods in models that used mortality (Table 3) and CVD hospitalization as the outcome. The approach that used both DAD and physician claims data had the highest C-statistics, followed by DAD data only, and physician claims data only (Table 4). For each data source, the C-statistics and CI improved for CVD hospitalization and one year mortality when the retrospective period was increased from 6 months to one year or more. The 3 year DAD and physician claims approach had the highest C-statistics and CI among these 12 approaches. The C-statistics and CI were lower for modeling CVD hospitalization as an outcome than for mortality (Table 4).

thumbnail
Table 3. Age and sex adjusted odds ratio (OR) and adjusted hazard ratio (HR) of mortality and cardiovascular disease CVD) hospitalization by duration of retrospective observation among patients with newly diagnosed hypertension.

https://doi.org/10.1371/journal.pone.0162074.t003

thumbnail
Table 4. The C-statistics and concordance index (CI) with 95% confidence intervals using 10-fold cross validation for mortality and cardiovascular disease hospitalization among newly diagnosed hypertension by data sources and ‘look back’ intervals.

https://doi.org/10.1371/journal.pone.0162074.t004

Discussion

We found that the use of DAD data alone underestimated the prevalence of comorbidities, while use of both physician claims and DAD data with a 3-year retrospective observation period yielded the highest prevalence. The model performance for one year mortality and cardiovascular disease hospitalization was statistical significantly improved for the approach that used DAD and physician claims data when compared to the approach of using only one of these sources for one year or longer.

Preen et al. [21] found that less than 50% of comorbidities that were recorded in the five years preceding were captured in the hospital record index. Another study in the United States reported that the prevalence of comorbidities increased from 10% when using only inpatient data to 25% when using both inpatient and physician claims data [8]. Our study supports the findings from this literature. We found that 75% of comorbidities are missing when only DAD data is used compared to when both physician claims and DAD data is used, with a retrospective observation period of three years. The prevalence of having at least one Charlson comorbidity was 9.2% when the hospital records index in DAD data was used. This increased to 18.7% when we employed a 3-year retrospective observation period to DAD data. Using physician claims data further improved the identification of comorbidities, as the prevalence reached 43.2%. These studies clearly suggest that DAD and physician claims data with a long duration of observation should be used to capture comorbidity profile.

We found that the use of different data sources had a higher impact on risk adjustment model performance than the duration of retrospective observation period for mortality and CVD outcomes that were based on C-statistics and CI. One study in the United States reported their C-statistics remained the same between a 1 and 2 year observation period for combined inpatient and outpatient and data [20]. Using inpatient data, Preen et al.[21] in Australia reported that their C-statistics had little to no improvement from a 1-year to a 5-year observation period. In Canada, however, Lee et al. [29] found that increasing the duration of retrospective observation period increased the detection of comorbidities, but only marginally improved their predictive model performance for 30-day mortality. There are several possible explanations for this. First, inpatients with hypertension are more likely to be sicker than outpatients with hypertension. As such, patients with multiple conditions and poorer outcomes are captured in DAD data. Second, regardless of the service location, physician claims record conditions not only from outpatients but also from inpatients and emergency department visitors. There is therefore a huge overlap between conditions that are recorded in DAD and claims data. Third, hypertension as an outcome is determined by many factors, such as social-demographic and clinical characteristics. However, administrative data does not capture many important factors, such as the severity of a disease. As an index of case-mix, Charlson comorbidities that are defined using data may have reached the maximum capacity for predicting clinical outcomes, such as CVD, even where prevalence increases with duration. Regardless of how data may be enhanced through duration, however, we have no much margin to improve risk adjustment model performance. Fourth, patients with severe conditions visit physicians frequently and their comorbidities could be captures within a short duration.

We found that the ORs and HRs for comorbidities slightly decreased with duration of observation for both mortality and CVD hospitalizations. This decrease may in part be due to false positive comorbidities, which dilute the effect of comorbidities on poor outcomes. Patients with mild comorbidities, however, are less likely to visit their physicians and more likely to have a longer survival than patients with severe comorbidities.

Limitations to this study are as follows. First and foremost, we did not validate comorbidities that were identified in physician claims data. Previous studies have indicated that validation for chronic conditions varies for different conditions and data sources. As the observation period increased, more false positive chronic conditions were included due to ICD coding errors. These false positive conditions might influence the discriminatory ability for poor outcomes. Secondly, comorbidities were defined prior to hypertension diagnosis. Some comorbid conditions may have occurred after the diagnosis of hypertension and contributed to poor outcomes. We did not account for time-dependent variables. Thirdly, we followed patients with incident hypertension for up to 12 years. The HRs might have changed with a longer follow-up period. Lastly, hypertension and comorbidities in this study were identified using Canadian administrative health data from a universal health insurance program. Thus, the findings from our study may not be generalizable to countries with different healthcare systems.

In conclusion, using a combination of DAD and physician claims data substantially improved the capture of chronic comorbidities. Prevalence was significantly increased with an increase in the duration of a retrospective observation period. A combination of DAD and physician claims data with one year or longer observation duration observation duration improves predictive model performance for cardiovascular disease hospitalization and one year mortality outcomes, because many chronic conditions are managed in outpatient clinical settings.

Acknowledgments

The following are members of the Hypertension and Outcomes Surveillance Team of the Canadian Hypertension Education Program: Oliver Baclic, Gillian Bartlett, Debra Butt, Norm Campbell, Guanmin Chen, Sulan Dai, Brenda Hemmelgarn, Michael Hill, Helen Johansen, Nadia Khan, Lisa Lix, Finlay McAlister, Jay Onysko, Hude Quan, Mark Smith, Larry Svenson, Gary Teare, Karen Tu, Robin Walker, Andy Wielgosz. We thank Kelsey Lucyk for her contribution to correct grammar and polish the sentence for this paper.

This study is based in part on deidentified data provided by Canadian provincial health ministries. The interpretation and conclusions contained herein are those of the researchers and do not represent the views of these provincial governments. The opinions, results, and conclusions reported in this article are those of the authors and are independent from the funding sources.

H.Q. and F.M. salary support is from Alberta Innovates–Health Solutions, FM holds the University of Alberta Chair in Cardiac Outcomes Research, K.T. is supported from a Fellowship in Primary Care Research by the Canadian Institute for Health Research, N.C. holds the Heart and Stroke Foundation of Canada CIHR Chair in Hypertension Prevention and Control. B.H is supported by the Roy and Vi Baay Chair in Kidney Research.

Author Contributions

  1. Conceived and designed the experiments: GC LL KT BH NC FM HQ.
  2. Performed the experiments: GC KT NC HQ.
  3. Analyzed the data: GC LL KT BH FM HQ.
  4. Contributed reagents/materials/analysis tools: GC KT NC HQ.
  5. Wrote the paper: GC LL KT BH NC FM HQ.

References

  1. 1. Sharabiani MT, Aylin P, Bottle A. Systematic review of comorbidity indices for administrative data. Med Care 2012; 50: 1109–1118. pmid:22929993
  2. 2. Austin PC, Stanbrook MB, Anderson GM, Newman A, Gershon AS. Comparative ability of comorbidity classification methods for administrative data to predict outcomes in patients with chronic obstructive pulmonary disease. Ann Epidemiol 2012; 22: 881–887. pmid:23121992
  3. 3. Howell S, Coory M, Martin J, Duckett S. Using routine inpatient data to identify patients at risk of hospital readmission. BMC Health Serv Res 2009; 9: 96. pmid:19505342
  4. 4. Li B, Evans D, Faris P, Dean S, Quan H. Risk adjustment performance of Charlson and Elixhauser comorbidities in ICD-9 and ICD-10 administrative databases. BMC Health Serv Res 2008; 8: 12. pmid:18194561
  5. 5. Lix LM, Quail J, Teare G, Acan B. Performance of comorbidity measures for predicting outcomes in population-based osteoporosis cohorts. Osteoporos Int 2011; 22: 2633–2643. pmid:21305268
  6. 6. Quan H, Sundararajan V, Halfon P, Fong A, Burnand B, Luthi J et al. Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med Care 2005; 43: 1130–1139. pmid:16224307
  7. 7. Southern DA, Quan H, Ghali WA. Comparison of the Elixhauser and Charlson/Deyo methods of comorbidity measurement in administrative data. Med Care 2004; 42: 355–360. pmid:15076812
  8. 8. Klabunde CN, Harlan LC, Warren JL. Data sources for measuring comorbidity: a comparison of hospital records and medicare claims for cancer patients. Med Care 2006; 44: 921–928. pmid:17001263
  9. 9. Li P, Kim MM, Doshi JA. Comparison of the performance of the CMS Hierarchical Condition Category (CMS-HCC) risk adjuster with the Charlson and Elixhauser comorbidity measures in predicting mortality. BMC Health Serv Res 2010; 10: 245. pmid:20727154
  10. 10. Radley DC, Gottlieb DJ, Fisher ES, Tosteson AN. Comorbidity risk-adjustment strategies are comparable among persons with hip fracture. J Clin Epidemiol 2008; 61: 580–587. pmid:18471662
  11. 11. Schneeweiss S, Wang PS, Avorn J, Maclure M, Levin R, Glynn RJ. Consistency of performance ranking of comorbidity adjustment scores in Canadian and U.S. utilization data. J Gen Intern Med 2004; 19: 444–450. pmid:15109342
  12. 12. Wang PS, Walker A, Tsuang M, Orav EJ, Levin R, Avorn J. Strategies for improving comorbidity measures based on Medicare and Medicaid claims data. J Clin Epidemiol 2000; 53: 571–578. pmid:10880775
  13. 13. Yan Y, Birman-Deych E, Radford MJ, Nilasena DS, Gage BF. Comorbidity indices to predict mortality from Medicare data: results from the national registry of atrial fibrillation. Med Care 2005; 43: 1073–1077. pmid:16224299
  14. 14. Clark DO, Von Korff M, Saunders K, Baluch WM, Simon GE. A chronic disease score with empirically derived weights. Med Care 1995; 33: 783–795. pmid:7637401
  15. 15. Johnson RE, Hornbrook MC, Nichols GA. Replicating the chronic disease score (CDS) from automated pharmacy data. J Clin Epidemiol 1994; 47: 1191–1199. pmid:7722553
  16. 16. Leal JR, Laupland KB. Validity of ascertainment of co-morbid illness using administrative databases: a systematic review. Clin Microbiol Infect 2010; 16: 715–721. pmid:19614717
  17. 17. Perkins AJ, Kroenke K, Unutzer J, Katon W, Williams JW Jr., Hope C, et al. Common comorbidity scales were similar in their ability to predict health care costs and mortality. J Clin Epidemiol 2004; 57: 1040–1048. pmid:15528055
  18. 18. Quan H, Li B, Couris CM, Fushimi K, Graham P, Hider P, et al. Updating and validating the Charlson comorbidity index and score for risk adjustment in hospital discharge abstracts using data from 6 countries. Am J Epidemiol 2011; 173: 676–682. pmid:21330339
  19. 19. Quan H, Parsons GA, Ghali WA. Validity of information on comorbidity derived rom ICD-9-CCM administrative data. Med Care 2002; 40: 675–685. pmid:12187181
  20. 20. Zhang JX, Iwashyna TJ, Christakis NA. The performance of different lookback periods and sources of information for Charlson comorbidity adjustment in Medicare claims. Med Care 1999; 37: 1128–1139. pmid:10549615
  21. 21. Preen DB, Holman CD, Spilsbury K, Semmens JB, Brameld KJ. Length of comorbidity lookback period affected regression model performance of administrative health data. J Clin Epidemiol 2006; 59: 940–946. pmid:16895817
  22. 22. Quan H, Chen G, Tu K, Bartlett G, Butt DA, Campbell NR, et al. Outcomes among 3.5 million newly diagnosed hypertensive canadians. Can J Cardiol 2013; 29: 592–597. pmid:23465341
  23. 23. Quan H, Khan N, Hemmelgarn BR, Tu K, Chen G, Campbell NR, et al. Validation of a case definition to define hypertension using administrative data. Hypertension 2009; 54: 1423–1428. pmid:19858407
  24. 24. Kokotailo RA, Hill MD. Coding of stroke and stroke risk factors using international classification of diseases, revisions 9 and 10. Stroke 2005; 36: 1776–1781. pmid:16020772
  25. 25. Quach S, Blais C, Quan H. Administrative data have high variation in validity for recording heart failure. Can J Cardiol 2010; 26: 306–312. pmid:20931099
  26. 26. Quan H, Li B, Saunders LD, Parsons GA, Nilsson CI, Alibhai A, et al. Assessing validity of ICD-9-CM and ICD-10 administrative data in recording clinical conditions in a unique dually coded database. Health Serv Res 2008; 43: 1424–1441. pmid:18756617
  27. 27. Southern DA, Quan H, Ghali WA. Comparison of the Elixhauser and Charlson/Deyo Methods of Comorbidity Measurement in Administrative Data. Medical Care 2004; 42: 355–360. pmid:15076812
  28. 28. Gönen M, Heller G. Concordance probability and discriminatory power in proportional hazards regression. Biometrika 2005; 92: 965–970.
  29. 29. Lee DS, Donovan L, Austin PC, Gong Y, Liu PP, Rouleau JL, et al. Comparison of coding of heart failure and comorbidities in administrative and clinical data for use in outcomes research. Med Care 2005; 43: 182–188. pmid:15655432