Variation and statistical reliability of publicly reported primary care diagnostic activity indicators for cancer: a cross-sectional ecological study of routine data

Objectives Recent public reporting initiatives in England highlight general practice variation in indicators of diagnostic activity related to cancer. We aimed to quantify the size and sources of variation and the reliability of practice-level estimates of such indicators, to better inform how this information is interpreted and used for quality improvement purposes. Design Ecological cross-sectional study. Setting English primary care. Participants All general practices in England with at least 1000 patients. Main outcome measures Sixteen diagnostic activity indicators from the Cancer Services Public Health Profiles. Results Mixed-effects logistic and Poisson regression showed that substantial proportions of the observed variance in practice scores reflected chance, variably so for different indicators (between 7% and 85%). However, after accounting for the role of chance, there remained substantial variation between practices (typically up to twofold variation between the 75th and 25th centiles of practice scores, and up to fourfold variation between the 90th and 10th centiles). The age and sex profile of practice populations explained some of this variation, by different amounts across indicators. Generally, the reliability of diagnostic process indicators relating to broader populations of patients most of whom do not have cancer (eg, rate of endoscopic investigations, or urgent referrals for suspected cancer (also known as ‘two week wait referrals’)) was high (≥0.80) or very high (≥0.90). In contrast, the reliability of diagnostic outcome indicators relating to incident cancer cases (eg, per cent of all cancer cases detected after an emergency presentation) ranged from 0.24 to 0.54, which is well below recommended thresholds (≥0.70). Conclusions Use of indicators of diagnostic activity in individual general practices should principally focus on process indicators which have adequate or high reliability and not outcome indicators which are unreliable at practice level.


Indicator outlier values
Initial analysis identified some outlier values for the screening uptake indicators. These were identified visually from the distributions. In all three cases there was a very long tail towards highly unlikely low uptake values, inconsistent with the variability seen for the remaining practices. In some cases there was even an indication of a secondary peak at these low values. These low values existed in the presence of large denominators indicating they were unlikely to be due to chance. A cut-off was chosen following visual inspection (Appendix 3). Where n is the number of observations per unit and in the case of binary or rate indicators the within unit variance is assumed to follow the binomial or Poisson distribution respectively. In the context of this study a unit is a practice, but we use the terminology unit here to be more general.

Appendix 1b. Data sources and related periods
Following on from the definition above, reliability is often estimated by first estimating the between unit variance and the within unit variance. However, in the context of the binomial or Poisson distribution estimating the within unit variance is not straightforward. Although various methods have been proposed to estimate the within unit variance we employ a method which does not directly use variance estimates. Instead we utilise the relationship between "Empirical Bayes" estimates of Unit score and Maximum Likelihood estimates of the unit score. "Empirical Bayes" estimates of unit scores (also known as Best Linear Unbiased Predictions or BLUPs) are related to the observed scores (Maximum Likelihood estimates) through reliability. Specifically the observed scores (on the appropriate scale) are shrunk towards the mean of unit scores by an amount equal to the inter-unit reliability. Thus by knowing both the "Empirical Bayes" estimates of Unit scores and Maximum Likelihood estimates of the unit score we can obtain an estimate of reliability for each Unit.
The first step in estimating unit reliabilities is to fit a mixed-effect generalised linear model which contains only a constant term and a random intercept for unit, i.e.

= 0 +
In the case of proportion indicators = logit( ), where is the underlying proportion in unit and the data within unit are assumed to be binomially distributed. In the case of rate indicators = log( ), where is the underlying rate in unit and the data within unit are assumed to follow the Poisson distribution. In each case represents a unit effect and is assumed to be normally distributed with a mean of zero, and 0 is a constant term, both of which are on the log-odds scale for proportion indicators and the log-rate scale for rate indicators. Following fitting of the model, "Empirical Bayes" estimates of unit effects, ̂, are obtained which represent the best estimate of the deviation of unit from the mean of all units 0 .
The estimated inter-unit reliability for proportion indicators is given by

And for rate indicators is given by
Reliability =l og(̂) − 0 Where ̂ and ̂ are the observed proportion or rate in unit respectively.
Initial work showed that for binary indicators, reliability estimated in this way was indistinguishable from that estimated using the method applied by