Review ArticleAdministrative database research has unique characteristics that can risk biased results
Introduction
Health care provision is becoming increasingly digitized. In most jurisdictions, patient visits are logged in registration systems. The dates of physician visits, laboratory tests, and radiological investigations are recorded in physician claims databases. The diagnoses, procedures, and simple outcomes of visits to emergency departments or admissions to hospital are documented in hospitalization databases. Each of these systems leaves a trail of digital information that describes (to varying degrees of detail) a patient’s course through a health care system. These data can be used to conduct research studies that can be termed “administrative database research.”
As in other types of observational research, the overarching goal of administrative database research is the description of a particular measure (or variable) with or without its relationship to another measure. Many of the guidelines that are available to assess the internal validity of observational research [1] apply to administrative database research. However, these studies have several unique issues that also need to be addressed by the writer and evaluated by the reader to establish their internal validity [2]. If these are not addressed or considered, potential threats to the validity of administrative database research may persist. In this article, we discuss five issues that likely should be considered whenever administrative database research is written or read.
Section snippets
Description of the data sets used for study
In studies using primary data collection, the methods section describes steps taken to collect the data used to create the study analytical data set. Key issues here include the sampling frame and sampling methods as well as the inclusion and exclusion criteria. This information helps readers understand which people were considered for inclusion in the study and has important implications for determining the internal and, especially, external validity of the study. In administrative database
Reporting diagnostic and procedural code accuracy using meaningful statistics
Administrative data use codes to identify diagnoses or procedures that are often used for research studies. In a systematic review of administrative database research [3], we found that 76% of administrative database studies used diagnostic or procedural codes to define patient cohorts, exposures, or outcomes. HRAs (or, occasionally, physicians) review health records to identify diagnoses and procedures that have been documented therein. They then use standard coding systems to substitute the
Statistical significance vs. clinical significance
The issue of statistical significance vs. clinical significance is not unique to administrative database research. However, these studies often have very large sample sizes, thereby highlighting this issue and making it a recurrent theme in such studies.
Table 1 illustrates the influence that study sample size can have on P-values for statistical testing. In this example, two equally sized groups have a very similar baseline prevalence of a binary trait (49.9% vs. 50.1%). Data in the table show
Time-dependent bias
Patient-level variables can change value during observation. Such “time-dependent” variables can be termed “baseline immeasurable” if their value cannot be determined at baseline. Biased conclusions can occur when these variables are analyzed as if their values were known at the start of patient observation.
In this situation, a patient’s outcome will influence the value of their time-dependent variable. Consider a binomial (0/1) time-dependent covariate indicating the presence or absence of a
Accounting for clustering
Study samples derived from health administrative data are frequently subject to clustering. For example, consider a study that consists of patients hospitalized with an acute myocardial infarction (AMI) who were treated by physicians who practice within hospitals [18]. This study consists of data having a three-level structure of AMI patients nested within physicians nested within hospitals.
Researchers using health administrative data are frequently interested in determining the association
Summary
Administrative database research can offer extensive opportunities for health-related scientific studies. In this article, we discussed five issues that we believe are especially prominent in administrative database research. It is important that writers of administrative database research address and clarify these issues to avoid confusion and misinformation in readers.
References (23)
Understanding secondary databases: a commentary on “Sources of bias for health state characteristics in secondary databases”
J Clin Epidemiol
(2007)- et al.
Administrative database research infrequently uses validated diagnostic or procedural codes
J Clin Epidemiol
(2011) - et al.
Time-dependent bias due to improper analytical methodology is common in prominent medical journals
J Clin Epidemiol
(2004) - et al.
A reader’s guide to the evaluation of prognostic studies
Postgrad Med J
(1996) - et al.
Use of biomarkers and surrogate endpoints in drug development and regulatory decision making: criteria, validation, strategies
Annu Rev Pharmacol Toxicol
(2001) - et al.
Estimating the proportion of treatment effect explained by a surrogate marker
Stat Med
(1997) - et al.
Statistical validation of intermediate endpoints for chronic diseases
Stat Med
(1992) - et al.
Individual patient- versus group-level data meta-regressions for the investigation of treatment effect modifiers: ecological bias rears its ugly head
Stat Med
(2002) - et al.
Variation of sensitivity, specificity, likelihood ratios and predictive values with disease prevalence
Stat Med
(1997) - et al.
The interpretation of diagnostic data. Clinical epidemiology. A basic science for clinical medicine
(1991)