Background Variations in inpatient medical care are typically attributed to system, hospital or patient factors. Little is known about variations at the physician level within hospitals. We described the physician-level variation in clinical outcomes and resource use in general internal medicine (GIM).
Methods This was an observational study of all emergency admissions to GIM at seven hospitals in Ontario, Canada, over a 5-year period between 2010 and 2015. Physician-level variations in inpatient mortality, hospital length of stay, 30-day readmission and use of ‘advanced imaging’ (CT, MRI or ultrasound scans) were measured. Physicians were categorised into quartiles within each hospital for each outcome and then quartiles were pooled across all hospitals (eg, physicians in the highest quartile at each hospital were grouped together). We report absolute differences between physicians in the highest and lowest quartiles after matching admissions based on propensity scores to account for patient-level variation.
Results The sample included 103 085 admissions to 135 attending physicians. After propensity score matching, the difference between physicians in the highest and lowest quartiles for in-hospital mortality was 2.4% (95% CI 0.6% to 4.3%, p<0.01); for readmission was 3.3% (95% CI 0.7% to 5.9%, p<0.01); for advanced imaging was 0.32 tests per admission (95% CI 0.12 to 0.52, p<0.01); and for hospital length of stay was 1.2 additional days per admission (95% CI 0.5 to 1.9, p<0.01). Physician-level differences in length of stay and imaging use were consistent across numerous sensitivity analyses and stable over time. Differences in mortality and readmission were consistent across most sensitivity analyses but were not stable over time and estimates were limited by sample size.
Conclusions Patient outcomes and resource use in inpatient medical care varied substantially across physicians in this study. Physician-level variations in length of stay and imaging use were unlikely to be explained by patient factors whereas differences in mortality and readmission should be interpreted with caution and could be explained by unmeasured confounders. Physician-level variations may represent practice differences that highlight quality improvement opportunities.
- hospital medicine
- health services research
- quality improvement
Statistics from Altmetric.com
Variations in inpatient medical care are well documented and typically attributed to system, hospital or patient factors.1–6 This approach is reflected in the system-level focus that predominates in the quality improvement and patient safety literature.7 Measurement efforts have also focused on the outcomes of care delivered by individual surgeons8 9 and have described notable surgeon-level variations.10 Recently, Tsugawa and colleagues reported marked variations in physician spending in inpatient medical care in US hospitals.11 Higher spending was not associated with better clinical outcomes. Broad physician attributes, such as experience,12 13 sex,14 training15 and specialty,16 may have modest and mixed associations with variations in patient outcomes. However, the magnitude of physician-level variations in inpatient medical care, and the scope of variations across a range of process and outcome measures, have not been described. Research in this area is based on large administrative data sets,11–15 17 18 which lack clinical data, or smaller clinical cohorts,19 which are limited by sample size. Thus, it remains unclear whether physician-level variations in outcomes and resource use may be explained by patient-level differences.
Much of clinical care is driven by decisions made by individual physicians and their patients. A growing number of quality improvement initiatives, such as audit-and-feedback reports,20 target individual physicians. Understanding the degree of physician-level variations in care is important for understanding the potential effectiveness of physician-targeted quality improvement interventions. General internal medicine (GIM) inpatient care provides a unique opportunity to examine physician-level variation because admissions are non-elective and patients are ‘quasirandomly’ allocated to physicians based on work schedules.11 Thus, physician-level variation within hospitals may reflect practice differences rather than differences in patient characteristics. Moreover, GIM patients account for nearly 40% of emergency department admissions to hospital.21
The purpose of this study was first to measure physician-level variations in resource use and selected patient outcomes in GIM at seven hospitals over 5 years. Second, we used a granular clinical data set to explore whether physician-level variations may reflect real practice differences or are more likely explained by differences related to patient characteristics or case mix. We focused on within-hospital variations to isolate physician-level variation from system or hospital factors.
Design, setting and participants
This observational study included GIM inpatients at seven large hospital sites participating in the General Medicine Inpatient Initiative (GEMINI) in Toronto and Mississauga, two adjacent cities in Ontario, Canada.21 Participating hospitals included five academic institutions and two large community-based hospitals affiliated with the University of Toronto. All participating healthcare organisations are independent, publicly funded and provide tertiary and/or quaternary care. Public insurance covers the costs of hospital care and physician services and thus patients come from all socioeconomic backgrounds. Residents and medical students from the University of Toronto rotate through all the participating hospitals.
GIM services at GEMINI hospitals have been previously described in detail.21 Inpatient GIM care is provided in a hospitalist model, predominantly by clinical teaching teams in the academic centres and by non-teaching teams in the community hospitals. The attending physicians are internists (93%) and family physicians (7%) who typically cover the inpatient service in ‘blocks’ lasting between 1 and 4 consecutive weeks (most commonly 2 weeks). GIM admissions are almost entirely non-elective and through the emergency department. There are typically at least four physicians attending on GIM in parallel at each hospital at any given time. Patients are assigned to attending physicians ‘quasirandomly’ by the on-call internal medicine resident or staff physician at the time of admission in the emergency department. As a result, over large samples, patient characteristics should be balanced across physicians within a hospital. Thus, physician-level differences in clinical outcomes and resource use within a hospital can be attributed to differences in clinical practice rather than differences in patient characteristics or case mix. This assumption has informed several large analyses examining emergency physician and hospitalist practice patterns in the USA,11 13 14 22 but it has not been tested outside this setting or using clinical data to enhance risk adjustment.
Inclusions and exclusions
We included all GIM hospitalisations between 1 April 2010 and 31 March 2015, defined as patients who were either admitted to or discharged from the GIM service. GIM services were distinguished from other hospital services based on data extracted from hospital information systems.21 In order to maximise comparability between physicians within a hospital, we only included GIM visits for whom the ‘most responsible physician’ was an internist (including internal medicine subspecialists) who attends on the inpatient GIM service. A study investigator at each hospital identified the physicians who attended on the inpatient GIM service during the study period. We excluded family physician hospitalists because their training, case mix and practice patterns may differ systematically from internists. We excluded GIM patients who were admitted from any route other than the emergency department (n=4213, 3.4%), to avoid elective admissions and interhospital transfers that might not have been assigned to an attending physician quasirandomly. Finally, we excluded admissions with hospital length of stay greater than 30 days (n=5423, 4.4%), because it is difficult to attribute their care to a single physician. Hospital admissions were attributed to the ‘most responsible physician’ as per the Canadian Institute for Health Information (CIHI) Discharge Abstract Database, defined as the physician who is ‘responsible for the care and treatment of the patient for the majority of the visit to the health care facility’.23 The most responsible physician is assigned retrospectively by the hospital (typically by a trained chart abstractor) after discharge and if a definition has been used extensively in health services research.12 24–26 We included only physicians responsible for at least 100 admissions over the study period to avoid unstable estimates related to small sample size. Two of the hospitals participating in GEMINI exist within a larger healthcare organisation. Transfers between hospitals within the organisation occurred in a small number (3.8%) of that organisation’s hospitalisations. In these cases, the hospitalisation was attributed to the admitting hospital and was only included in the physician-level analysis if the most responsible physician was located at the admitting hospital.
Data collection for GEMINI has previously been described in detail21 and included linking hospital administrative data with electronic clinical data at the level of individual hospital admissions. Demographic and clinical data, including diagnoses coded using the International Statistical Classification of Diseases and Related Health Problems, Tenth Revision, Canada (ICD-10-CA), were collected from hospital administrative databases as reported to CIHI. Laboratory and radiology data were extracted from hospital information systems.
We report four measures of clinical outcomes and resource use: inpatient mortality, hospital length of stay, 30-day readmission at any participating hospital and use of ‘advanced imaging’, defined as the number of diagnostic CT, MRI or ultrasound scans per admission. These were selected to represent important aspects of clinical care but are not intended to be direct indicators of quality or appropriateness. For example, in-hospital mortality is sometimes expected because delivering end-of-life care is an important part of GIM as many patients have life-limiting illnesses. Thus, we do not intend to suggest that a lower mortality rate or shorter hospital length of stay necessarily indicates higher quality care.
The following patient characteristics were used in multivariable adjustment and matching: age, sex, Charlson Comorbidity Index score,27 day of admission (categorised as weekday vs weekend), time of admission (categorised as daytime between 08:00 and 17:00 or night-time otherwise), fiscal year of admission, admission hospital, primary discharge diagnosis, admission to GIM within the previous 30 days and the Laboratory-based Acute Physiology Score (LAPS)28 which ranges from 0 to 256 (higher scores reflect a greater risk of inpatient and 30-day mortality). The combination of age, sex, LAPS and Charlson Comorbidity Index score has been validated as a predictor of inpatient mortality in Ontario, Canada (reported c-statistic 0.89).29 These covariates were included to adjust for patient-level differences in mortality risk. As we previously described,30 discharge diagnoses were categorised based on ICD-10-CA codes into clinically meaningful and mutually exclusive groups, using the Clinical Classifications Software tool V.2018.1.31
For our first research question, to measure physician-level variation, we report the unadjusted absolute differences between the physicians with the highest and lowest values on each outcome measure within each hospital. To test the statistical significance of physician-level differences at each hospital, we set the three lowest physicians as the reference group and performed pairwise comparisons between each physician and the reference group. To quantify physician-level variation across hospitals, we categorised physicians into quartiles within each hospital for each of the four outcome measures. We then pooled the quartiles across all hospitals and compared the highest and lowest quartiles, testing for statistical significance with the Kruskal-Wallis test for continuous outcomes and χ2 test for binary outcomes.
For our second research question, to examine whether differences in patient characteristics contributed to the observed physician-level variations, we performed seven analyses. First, we report patient baseline characteristics for all admissions across physician quartiles to determine whether they were balanced by the quasirandom allocation of patients to physicians. For each patient characteristic, we calculated the standardised difference between each quartile (eg, we calculated the standardised difference between quartiles 1 and 2, 1 and 3, 1 and 4, 2 and 3, 2 and 4, and so on). Standardised difference was computed as the mean difference between quartiles divided by the SD across hospitalisations in both quartiles. We report the largest standardised difference across all the pairwise comparisons for each patient characteristic, referring to this as the ‘maximum standardized difference’. Standardised differences greater than 0.1 are considered meaningful markers of imbalance.32
Second, we compared physicians in the highest and lowest quartiles of each outcome before and after restricting the comparison to a matched sample of hospital admissions to reduce bias that might arise from non-random assignment of patients to physicians attending in GIM. We matched admissions 1:1 based on patient age (categorised into 5-year intervals), sex, hospital site, fiscal year of admission and propensity score. The propensity score was calculated using logistic regression models predicting the probability of being in the highest physician quartile based on the baseline patient characteristics reported above. We compared estimates of physician-level differences before and after propensity score matching. We hypothesised that if patients were quasirandomly allocated to physicians, the estimates of physician-level differences would be similar before and after matching, suggesting that patient-level differences contributed little to the observed physician-level differences.
Third, because there is a large diversity in the conditions cared for in GIM,30 we compared admissions with only the 10 most common primary discharge diagnoses to reduce potential confounding related to case mix differences among uncommon diagnoses. We again compared the highest and lowest physician quartiles after performing the same matching algorithm based on patient age, sex, hospital, fiscal year of admission and propensity score in this restricted sample.
Fourth, we fit multivariable linear or logistic regression models to estimate physician-level outcomes after adjusting for the baseline patient characteristics described above. We calculated the Pearson correlation between the adjusted and unadjusted physician-level outcomes, with the expectation that a high degree of correlation suggests that patient-level factors explain relatively little physician-level variation.
Fifth, we performed a split sample validation to reduce the effects of unmeasured patient factors on physician-level differences and assess the stability in physician-level differences over time. The 5-year study sample was split into the first 2 years and last 3 years. Physicians were assigned to quartiles based on their within-hospital rankings on outcomes in the first part of the sample, and then patient characteristics and outcomes were reported for the same physician quartiles in the second part of the sample. To maximise stability of estimates, we included physicians with at least 100 admissions in each period. We performed a sample size calculation to estimate the sample that would be required to identify meaningful physician-level differences in mortality and readmission (see online supplementary appendix 1 for further details).
Sixth, we restricted the sample to hospitalisations with only a single attending physician to account for differences that might arise from misattribution in the ‘most responsible physician’ when there are multiple attending physicians involved in a patient’s care. We included only admissions with the same admitting, discharging and most responsible physician.
Seventh, to better understand physician-level differences in mortality, we excluded all patients with a palliative diagnostic code at hospital discharge (ICD-10-CA code Z51.5).
When reporting differences between physician quartiles, 95% CIs and p values were computed using adjusted SEs33 to account for clustering at the hospital level. All statistical analyses were performed using R V.3.5.0 (R Foundation for Statistical Computing).
There were 118 165 GIM admissions from the emergency department during the study period. After excluding physicians who were responsible for fewer than 100 admissions and admissions with length of stay greater than 30 days, the final study cohort included 103 085 admissions and 135 physicians (mean 764 admissions per physician, SD 533). The median patient age for admissions was 73 years (IQR 56–84), 50.6% were female and 42.6% had a Charlson Comorbidity Index score of 2 or greater (table 1).
Physician-level variations in care and outcomes
There were marked within-hospital variations in inpatient mortality, length of stay, 30-day readmission and use of advanced imaging at all hospitals (figure 1). The absolute difference between the highest and lowest ranked physicians within each hospital in inpatient mortality ranged from 3.1% to 6.9% (figure 1), difference in average length of stay ranged from 1.4 to 3.6 days per admission, difference in readmission rate ranged from 4.4% to 12.7% and difference in use of advanced imaging ranged from 0.3 to 0.9 tests per admission.
The absolute difference between physicians in the highest and lowest quartiles at all hospitals was 2.4% (95% CI 0.7% to 4.1%, p=0.007) for in-hospital mortality, 1.2 days (95% CI 0.6 to 1.9 days, p<0.001) per admission in length of stay, 3.9% (95% CI 1.2% to 6.7%, p=0.005) for readmission and 0.35 advanced imaging tests (95% CI 0.15 to 0.55 tests, p<0.001) per admission (table 2).
Patient-level differences and physician variations
The results of all seven analyses suggest that measured patient characteristics did not account for the observed physician-level variations in hospital length of stay and imaging use. Physician-level differences in mortality and readmission were consistent across all analyses except the split sample validation, which was limited by sample size. Patient characteristics, including predictors of mortality (eg, age, Charlson Comorbidity Index score and LAPS), were generally well balanced across physician quartiles for all outcomes, before and after matching (table 1, online supplementary tables A–K). Compared with unadjusted analyses, the physician-level differences were similar after propensity score matching and after propensity score matching in a sample restricted to only the 10 most common discharge diagnoses (table 2). There was a strong correlation between unadjusted and adjusted physician-level estimates for each outcome measure, with correlation coefficients ranging from 0.81 to 0.96 (p<0.001 for all measurements, table 3).
In the split sample validation, there remained significant physician-level differences in length of stay (difference between highest and lowest quartiles was 0.7 days per admission, 95% CI 0.04 to 1.3 days, p=0.04) and imaging use (difference between highest and lowest quartiles was 0.26 tests per admission, 95% CI 0.06 to 0.47 tests, p=0.01, online supplementary appendix tables M and O). There were no significant differences observed in the second period of the split sample for mortality or readmission (table 2, online supplementary tables L and N), but these findings were limited by sample size as only 80 physicians had enough hospitalisations in both periods to be included (see online supplementary appendix 1 for further details).
When the sample was restricted to admissions with only a single attending physician, 65 982 admissions were included (64% of the original sample). Patient-level characteristics were well balanced across all quartiles and physician-level differences were similar to the main analysis (table 2, online supplementary tables P–S).
After excluding patients with a ‘palliative’ diagnostic code, the difference in mortality between the highest and lowest physician quartiles was 1.8% (95% CI 0.6% to 3.0%, p=0.005, online supplementary table T).
This multicentre study of 103 085 hospital admissions demonstrates significant physician-level variations in inpatient clinical outcomes and resource use in GIM at all participating hospitals. Using granular clinical data, we found that measured patient differences did not substantially account for the observed physician-level variation. Over large samples, measured patient characteristics and case mix were well balanced across physicians within a hospital. Physician-level variations persisted among hospitalisations with only a single attending physician, which removed misattribution that could result from handovers in care. Physician-level variation in hospital length of stay and advanced imaging use were consistent across multiple sensitivity analyses and stable over time in a split sample validation. Physician-level variations in mortality and readmission rate were consistent across numerous analyses but could not be demonstrated to be stable over time, which is likely due in part to insufficient sample size. Thus, our main findings regarding mortality and readmission should be interpreted with caution as unmeasured confounders could account for the observed variations. Our findings suggest that there are meaningful physician-level variations in care that are not explained by system, hospital or patient factors. Further research is needed to determine whether such variations exist primarily in process measures (such as imaging use) or whether they can also be stably demonstrated for clinical outcomes (such as mortality or readmission).
When comparing physicians in the highest and lowest quartiles in the same hospital on a matched sample of admissions, we found absolute differences of 2.4% in inpatient mortality (representing one additional death per 42 admissions), 3.3% in readmission (one additional readmission per 30 admissions), 1.2 days per admission in length of stay and 0.32 advanced imaging tests per admission (one additional test per three admissions). These effects could not be attributed to differences in measured patient risk. In particular, the physician-level estimates of mortality were not substantially changed after adjusting for age, sex, Charlson Comorbidity Index score and LAPS, the combination of which has been validated as a predictor of mortality risk.28 29 Significant mortality differences remained after excluding deaths coded as ‘palliative’, suggesting that both unexpected and expected deaths contributed to the observed variation. However, physician-level differences in mortality were not consistent over time and thus should be interpreted with caution. Unmeasured factors, such as patient preferences regarding resuscitation, could account for the observed variation. We believe our findings are likely to be generalisable because the study sample included five hospitals across Toronto, Canada’s largest city, and the only two hospitals that provide GIM care in Mississauga, Canada’s sixth largest city. The organisation of inpatient GIM care in Ontario and most of Canada is similar to the hospitals included in this study. Given that GIM patients account for nearly 40% of emergency department admissions to hospital,21 in aggregate, these differences have large system-wide effects.
Clinical and policy implications
There is a well-developed body of literature examining clinical practice variation at the regional or hospital level.34–39 Surgeon-level variations have also been described in several settings.8 10 Less is known about physician-level variations in inpatient medicine. Studies have focused on comparing physicians across a range of characteristics. In general, there appear to be small or no differences in patient care and outcomes based on physician sex,14 experience,12 13 training15 or type of practice (hospitalist vs primary care).16 Patient care and outcomes differ between specialists and generalists for particular diseases, such as heart failure,19 24 40 41 however important case mix differences limit these comparisons. Because of this focus on specific physician characteristics, the magnitude and scope of physician-level variation in inpatient medicine has been poorly understood. Our study advances the understanding of quality measurement in inpatient medicine by identifying meaningful and stable physician-level differences in length of stay and imaging use and illuminating the need for further research regarding variations in mortality and readmission. This is particularly salient given the interest in audit and feedback as a means of quality improvement20 and highlights the importance of standardised multicentre clinical data sets to support quality measurement. In Ontario, Canada, GIM physicians have historically received little systematic feedback about their patterns of clinical practice and thus opportunities for reflection, improvement or standardisation of care based on observed variations have been limited.
Recently, a series of papers using administrative databases in the USA has asserted that quasirandomisation in non-elective care permits physician-level comparisons.11 13–15 22 To our knowledge, this hypothesis has not been tested outside of the USA or outside of administrative databases. We linked administrative data with detailed clinical data in a Canadian context and examined a more comprehensive set of patient characteristics than has been previously reported. For example, adjusting for laboratory data has been shown to account for substantial residual confounding compared with analyses based purely on administrative data,42 43 and is an important strength of our study. Physician-level differences were similar across multiple analyses to account for patient characteristics. This detailed exploration suggests that patients were indeed quasirandomly allocated to physicians resulting in balanced characteristics within a hospital and permitting valid comparisons of physician-level practice differences.
The magnitude of physician-level variations observed in this study suggests that further work to understand and address these variations should be a priority for quality improvement. Further research is needed to better understand the drivers of physician-level practice differences in GIM, given that physician age, sex, training or experience explain only small amounts of variation.12–16 Drivers may include different physician attitudes, skills or approaches to organising their practice. Alternatively, the observed physician-level clustering may reflect variations in the practices of different hospital wards or interdisciplinary teams.
Our study has several important limitations. First, we do not draw direct conclusions about appropriateness or quality of care. Although we identify large variations, we acknowledge that the extremes (eg, lowest mortality or shortest length of stay) may not reflect the highest quality care. Second, we collected data about readmissions at any participating study site and thus miss readmissions to other hospitals. In our region, 82% of hospital readmissions occur to the same site.44 Our data likely capture substantially more than 80% of all readmissions because we included readmissions to any participating hospital. It is unlikely that the limitations in this measure would affect physicians within a hospital differently and bias within-hospital comparisons. Third, we were unable to measure out-of-hospital mortality and thus choices about end-of-life care (eg, choosing to palliate patients who are terminally ill in hospital rather than transfer to another facility) may have affected physician-level differences. We attempted to address this by adjusting for validated predictors of mortality and by excluding patients with ‘palliative’ diagnostic codes, and results were consistent across these analyses. Fourth, we did not have sufficient sample size at the physician level to determine whether variations in mortality or readmission were stable over time. Fifth, resident physicians play an important role in the delivery of GIM care at most participating study sites and test ordering is known to vary among residents.45 We were unable to include trainee schedules and thus could not account for this potential source of variation. However, residents in participating hospitals rotate across clinical teams and are not paired directly with individual attending physicians. Thus, over large samples, resident-driven differences would likely be non-differential between physicians. Physician-level variability was similar across all participating sites, including the two community hospitals where residents are less involved in clinical care. Moreover, the attending physician is ultimately responsible for care that is delivered by the residents under their supervision. Sixth, we were unable to measure important patient factors, such as socioeconomic status, functional status, cognition or patient preferences, which could vary across physicians and affect outcomes. Given these limitations, we believe that data about physician-level practice variations in hospital medicine should be used for formative feedback and to identify opportunities for quality improvement but not for formal evaluations or remuneration.
Our study demonstrates that patient outcomes and resource use in inpatient medical care vary substantially across physicians. Physician-level variations in length of stay and imaging use are unlikely to be explained by patient factors and further research is needed to establish the stability of variations in mortality and readmission over time. Physician-level variations may represent practice differences that highlight quality improvement opportunities.
Twitter @AmolAVerma, @Adetsky, @FahadRazak
Correction notice The artice has been corrected since it was pusblished online first. The Funding and the Disclaimer statements have been updated.
Contributors The study was designed by AAV and FR with input from all coauthors. YG and HYJ performed statistical analysis. The manuscript was drafted by AAV and all coauthors provided critical revision for important intellectual content and input in writing. AAV, YG, HYJ, AW, TT, SR, LLS, JLK and FR were involved in collecting data.
Funding This study was funded by Green Shield Canada Foundation and University of Toronto Division of General Internal Medicine. FR is supported by an award from the Mak Pak Chiu and Mak-Soo Lai Hing Chair in General Internal Medicine, University of Toronto.
Disclaimer The funding agencies and Ontario Health had no role in the design, conduct or interpretation of this study, and the views expressed herein do not reflect the views of the organisations.
Competing interests AAV and FR are employees of Ontario Health.
Patient consent for publication Not required.
Ethics approval Research ethics board approval was obtained from all participating hospitals. A waiver of participant consent was obtained from research ethics boards of all participating hospitals.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement Data are available upon reasonable request. The study’s lead investigators will make data for this manuscript available upon request as possible in compliance with local research ethics board requirements and data sharing agreements.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.