Article Text

Download PDFPDF

Do the stars align? Distribution of high-quality ratings of healthcare sectors across US markets
  1. Jose Figueroa1,2,
  2. Yevgeniy Feyman1,
  3. Daniel Blumenthal3,4,
  4. Ashish Jha5
  1. 1 Health Policy and Management, Harvard T H Chan School of Public Health, Cambridge, Massachusetts, USA
  2. 2 Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, USA
  3. 3 Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA
  4. 4 Department of Medicine, Harvard Medical School, Boston, Massachusetts, USA
  5. 5 Health Policy and Management, Harvard University, Boston, Massachusetts, USA
  1. Correspondence to Dr Ashish Jha, Harvard University, Health Policy and Management, Boston, Massachusetts, 02115, USA; ajha{at}


Background The US government created five-star rating systems to evaluate hospital, nursing homes, home health agency and dialysis centre quality. The degree to which quality is a property of organisations versus geographical markets is unclear.

Objectives To determine whether high-quality healthcare service sectors are clustered within US healthcare markets.

Design Using data from the Centers for Medicare and Medicaid Services’ Hospital, Dialysis, Nursing Home and Home Health Compare databases, we calculated the mean star ratings of four healthcare sectors in 304 US hospital referral regions (HRRs). For each sector, we ranked HRRs into terciles by mean star rating. Within each HRR, we assessed concordance of tercile rank across sectors using a multirater kappa. Using t-tests, we compared characteristics of HRRs with three to four top-ranked sectors, one to two top-ranked sectors and zero top-ranked sectors.

Results Six HRRs (2.0% of HRRs) had four top-ranked healthcare sectors, 38 (12.5%) had three top-ranked health sectors, 71 (23.4%) had two top-ranked sectors, 111 (36.5%) had one top-ranked sector and 78 (25.7%) HRRs had no top-ranked sectors. A multirater kappa across all sectors showed poor to slight agreement (K=0.055). Compared with HRRs with zero top-ranked sectors, those with three to four top-ranked sectors had higher median incomes, fewer black residents, lower mortality rates and were less impoverished. Results were similar for HRRs with one to two top-ranked sectors.

Conclusions Few US healthcare markets exhibit high-quality performance across four distinct healthcare service sectors, suggesting that high-quality care in one sector may not be dependent on or improve care quality in other sectors. Policies that promote accountability for quality across sectors (eg, bundled payments and shared quality metrics) may be needed to systematically improve quality across sectors.

  • quality measurement
  • health policy
  • healthcare quality improvement

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


The degree to which healthcare quality is a property of individual organisations versus geographical healthcare markets is not well understood. While literature on healthcare outcomes routinely measures patient outcomes at geographical, market or facility levels,1–4 such analyses typically do not examine performance across multiple healthcare sectors. But defining a healthcare market as ‘high-quality’ would seemingly necessitate understanding the quality across multiple care settings and whether outcomes in one setting are related to those in another.

High-quality hospitals, for instance, might pressure other types of delivery organisations, such as skilled nursing facilities, in their communities to improve care. Therefore, communities with many high-quality hospitals may have other types of high-performing delivery organisations. Conversely, low-quality hospitals may inadvertently promote care inefficiencies—by not conducting adequate discharge planning—that result in worse outcomes, or at least confusion, for patients.5 6 In this case, we might see a clustering of overall low-quality institutions. Another possible outcome is a middle-ground—if these institutions largely function independently, such that care quality in one sector does not influence quality in others, then one would expect to see little community-level correlation among facilities in different sectors. Which scenario predominates has substantial implications for policymakers. If market-level forces drive quality across different sectors of the healthcare system, then intervening on one part may improve care across multiple sectors. If not, broader approaches, such as bundled payments with shared quality metrics, or a sector-by-sector strategy may be needed.

Recently, in the USA, the Centers for Medicare and Medicaid Services (CMS) began rating four types of healthcare service sectors, namely hospitals,7 nursing homes,8 home health agencies9 and dialysis centres,10 using a publicly available ‘five-star rating’ system. This presents a unique opportunity to determine whether high-quality facilities, as defined by CMS, are clustered across particular healthcare markets. CMS star ratings are based on measures of utilisation, processes, outcomes and patient experience, which rarely overlap between sectors.11–14 For instance, hospitals are rated on 57 measures, which include clinical processes and patient outcomes like 30-day readmissions and mortality for common conditions.12 Meanwhile, home health agencies are rated on nine measures—three of which are process measures and six are home health-oriented outcome measures. Nursing homes are rated based on results of health inspections, staffing and process measures, while dialysis centres are rated based on nine quality measures, a mix of outcomes and processes as well. These ratings are released annually on the federal government’s compare websites.

Therefore, in this study, we sought to answer two key questions. First, are highly rated healthcare sectors clustered across certain US healthcare markets? If so, are there meaningful differences in the characteristics of healthcare markets with top-ranked healthcare service sectors versus markets with no top-ranked healthcare service sectors?



We obtained publicly available star rating data and facility size information from the CMS compare websites. We used 2017 files for home health, nursing homes and dialysis facilities.15–17 We used 2016 annual files for hospitals.18 These files were the latest publicly available at the time of the analysis. The methodology for determining the star rating varies across each of the four healthcare sectors as well as the number of measures on which these were based11–14 (see table 1 for full list of measures; please see online supplementary appendix file 1). We obtained hospital size information from the 2014 American Hospital Association (AHA) survey, the latest data available to us for purposes of this study.19 Of note, the performance measures in the 2016–2017 hospital star rating are generated from administrative claims data in 2014, the year of the AHA survey.

Supplementary file 1

Table 1

Characteristics of US government’s five-star rating systems across four different healthcare sectors

Not all facilities in the country receive star ratings. For instance, facilities with low volume of Medicare beneficiaries do not typically receive a star rating. The minimum number varies by health sector and depends on each individual measure. In addition, some specialty hospitals, for example federal Veterans Affairs hospitals, children’s hospitals and cancer-specialty hospitals, are excluded from participating in the quality reporting programme. Facilities without star rating were therefore excluded from our analytic sample. For the sectors that included size information (we did not have size data available for home health agencies), we observed that missing facilities were smaller than non-missing. Across all sectors in our final analytic sample, 12.3% of the facilities were missing star rating data. The rate of missing data was largely driven by hospitals and home health agencies (23.3% and 23.6%, respectively, were missing star rating data). The majority of nursing homes and dialysis centres received star ratings. In some cases (although infrequently), a small number of facilities drive the results of our analysis. Twelve hospital referral regions (HRRs) (4.0% of our sample) have two or fewer hospitals, 9 HRRs (3.0% of our sample) include two or fewer home health agencies, 9 HRRs (3.0% of our sample) include two or fewer dialysis facilities, while there were 0 HRRs with less than two skilled nursing facilities.

Market-level data were obtained from the Dartmouth Atlas20 and zip code tabulation area (ZCTA)-level community characteristics from American FactFinder.21 These data were generated from the 5-year (2011–2015) American Community Survey files, which is the most recent version of the survey available, and allows the most precise estimates at relatively small geographical areas. We converted ZCTA data to zip codes using a crosswalk, and aggregated zip code-level data at the HRR level using population-weighted averages. HRRs are geographical areas that represent independent healthcare markets.22 Two HRRs were excluded since they were not represented in the fully merged data. On average, each HRR in our analytic sample had 19.4 dialysis facilities, 11.7 hospitals, 30.1 home health agencies and 50.5 nursing homes. Our final analytic sample included 304 HRRs, with data from 33 939 facilities out of 38 679 total across these HRRs.


We created four rankings for each HRR by facility-size-weighted mean star ratings (one ranking per sector). Given that we are interested in measuring market-level quality, weighting is necessary in order to give greater importance to facilities that are more likely to provide a larger share of care within a market. This allows us to better account for the exposure of a market’s population to each facility. Prior work assessing facility-level outcomes to the market-level applied similar methods.1 For each HRR, we determined how many of the four healthcare sectors fell into each tercile. Next, we grouped HRRs into three categories: those with three or four healthcare sectors ranked in the top tercile, those with one or two sectors in the top tercile, and those with no sectors in the top tercile.

We then compared community characteristics between these categories using two tailed t-tests and calculated a multirater kappa (Fleiss’ kappa) to measure concordance.23 This permits measuring concordance across multiple groups without the need for a reference group. We rely on interpretations similar to standard kappa scores (lower values mean less concordance) with the following ranges: <0 less than chance agreement; 0.01–0.20 slight agreement; 0.21–0.40 fair agreement; 0.41–0.60 moderate agreement; 0.61–0.80 substantial agreement; 0.81–0.99 almost perfect agreement.24 All analyses were performed using Stata V.14.2.

Patient involvement

No patients were involved in setting the research question or the outcome measures, nor were they involved in developing plans for implementation of the study. No patients were asked to advise on interpretation or writing up of the results. There are no plans to disseminate the results of the research to study participants or the relevant patient community.

This study was exempted from institutional review board since data were publicly available with no patient identifiable information.


Of 304 HRRs in our sample, 6 (2.0%) had all four healthcare sectors ranked (by mean) in top tercile, 38 (12.5%) had three health sectors ranked in top tercile, 71 (23.4%) had two health sectors in top tercile, 111 (36.5%) had one health sector in top tercile, while 78 (25.7%) HRRs had no healthcare sectors ranked in top tercile (table 2). In all cases, this was similar to estimates of what we might expect due to chance alone. An overall multirater kappa across all sectors (0.055; p<0.001) confirmed that tercile rankings across sectors, within an HRR, had poor to slight agreement.

Table 2

Distribution of health sectors among top and bottom terciles of mean quality rating

HRRs, hospital referral regions.

While we observed little concordance between sectors within HRRs, we found significant differences on several demographic measures between HRRs with three or four top-ranked health sectors and zero top-ranked healthcare sectors (table 3). Compared with ‘low-quality’ markets (those with zero top-ranked health sectors), ‘high-quality’ markets (HRRs with three or four top-ranked health sectors) had higher median incomes (p<0.001), less poverty (p<0.001), fewer black residents (p<0.001), lower mortality among Medicare beneficiaries (p<0.001) and spent less per Medicare beneficiary overall (p<0.001). There were no significant differences in market supply measures except for fewer hospital beds in high-quality markets (p<0.001). Results were qualitatively similar in a comparison with ‘mid-quality’ HRRs (those with one or two top-ranked health sectors); however, spending per Medicare beneficiary was no longer significant relative to low-quality markets.

Table 3

Characteristics of HRRs with highly ranked health sectors (weighted by facility size)


We found that few US healthcare markets perform well across all four healthcare service sectors as rated by CMS. Only 2% of HRRs ranked in the top tercile of quality rating across hospitals, dialysis centres, nursing homes and home health agencies. These markets were wealthier, had fewer minority patients and spent less per Medicare beneficiary than HRRs with no top-ranked healthcare service sectors.

The lack of a stronger clustering among healthcare sectors within markets may have several potential explanations. First, there may be few community-level drivers of quality, possibly due to the silos created by our payment system, which offers little incentive for high-performing sectors to hold poor-performing sectors accountable. To the extent this is true, it suggests that thinking of quality as an intrinsic feature of healthcare markets broadly is insufficient, at least under existing payment schemes. More importantly, it underscores the need for changes in payment models to emphasise quality across the full spectrum of healthcare, for instance, by making acute care providers partly responsible for outcomes in the postacute setting. New York’s Delivery System Reform Incentive Payment programme, for instance, created large delivery systems through the state which are tasked with managing health for their assigned Medicaid populations.25 These systems include both acute and postacute providers like hospitals and skilled nursing facilities, respectively. It is hoped that this integration will lead to more appropriate incentives for providers to hold each other accountable.

Another explanation of low concordance of highly rated facilities across health sectors might simply be that CMS star ratings are measuring quality with enough imprecision to make it difficult to detect market-level signals of quality. If CMS star ratings are in fact not measuring intrinsic healthcare quality, then there is no reason to expect clustering of high-quality facilities between sectors. Any observed clustering in this case would be purely random. Indeed, whether star ratings actually measure ‘quality’ is an open, unresolved question. Research has found that hospitals with five stars have both lower readmission rates and lower 30-day mortality rates as compared with one-star hospitals after accounting for hospital and market-level factors.26 However, some have raised concerns that these ratings may nevertheless be misleading given that well-known academic medical centres considered high-quality by other ratings failed to receive top marks despite also having lower mortality than non-academic centres.27 28 It is possible that potential risk-adjustment methods do not appropriately take into account medical complexity of patients in other measures, for example, in patient experience and measures of safety of care,29 which then influences the overall star rating. If this explanation dominates, then the federal government should consider developing better measures or a more reliable risk-adjustment methodology. Another issue stems from the fact that many measures, for example in nursing homes, dialysis centres and home health agencies, are self-reported by each facility. Prior research has found that self-reported measures in nursing homes are often discordant with state health inspection scores.30 To the extent that self-reported measures artificially inflate star ratings, differences between high versus low rated facilities may not reflect underlying quality but rather reporting practices.

One final possible explanation is that star ratings do measure quality, but demographic factors tend to be more important in determining concordance and outcomes. This suggests that star ratings may not adequately adjust for socioeconomic status. The differences we observe across several demographic and economic measures between markets with high-quality concordance and those without do hint at there being some signal of quality in the star ratings, but that these market-level characteristics are more predictive of quality. To the extent this is true, it suggests that measures should more adequately adjust for socioeconomic determinants of health.

Regardless of the true underlying explanation, our findings support the need for improved measures of healthcare quality and payment system reform. This is particularly pressing given that CMS has begun paying facilities based on performance on several of the star rating measures across each sector. Pay-for-performance programmes include the Hospital Value-Based Purchasing31 for hospitals, the End-Stage Renal Disease Quality Incentive Program32 for dialysis centres, and most recently the Home Health Value-Based Purchasing programme.33 In the future, CMS also plans to pay nursing homes based on performance.34

There are several limitations to our study. First, as noted earlier, it is still up for debate how well star ratings measure intrinsic healthcare quality, which offers one explanation of why we may see poor concordance of highly rated facilities across US healthcare markets. Our study therefore raises important considerations around refinement of current measures and possible need for better adjustment for socioeconomic factors. Second, not all facilities receive star ratings. Some are too small while others are excluded from participation of quality reporting programme. Lastly, our analysis focuses on the overall star rating, which consists of multiple measures of clinical processes, outcomes and patient experience across each facility. It is possible that there may be more concordance among certain submeasures that we are not capturing when using overall star rating. Future research should investigate the components of star ratings in this respect.

In conclusion, we found that few markets, which were generally wealthier with fewer minorities, performed well across all four healthcare sectors as measured by CMS star ratings. This suggests that broader approaches (such as bundled payments and/or shared metrics) or a sector-by-sector strategy may be needed to improve overall quality across US healthcare markets.

Note: Percentages were rounded to one decimal point and may thus not add up to 100% exactly.

Random chance calculation: Embedded Image where x is the desired number of health sectors rated in a given tercile, and where Embedded Image is the binomial coefficient Embedded Image .



  • Contributors All authors have been directly involved in the planning, conduct and reporting of the analysis.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement All data from this study are publicly available and referenced in the manuscript. Any interested parties will be able to reproduce our results using these data.