Table 4

Sources of reviewer disagreements

Source of disagreementSource descriptionLiterature examples
OmissionsSome disagreements were associated with simple reviewer mistakes, that is, one reviewer overlooking reported informationSeveral disagreements were simply due to one reviewer overlooking reported information and did not seem to follow any pattern (random errors). However, the low agreement in the Spread domain seemed to have, in parts, to do with information being ‘buried’ in the discussion section
Omission-based disagreement was also encountered repeatedly for the domain Organisational characteristics, due to information not being reported in the main manuscript text but elsewhere, for example in the author's biography32
Interpretation of reported informationSome disagreements were associated with the interpretation of the information that was reported in the publicationThe low agreement in the domain Adherence/fidelity was to some extent associated with publications where adherence was the main outcome or the outcome and the intervention were identical (eg, guideline implementation to improve adherence to evidence-based practices)33
A further example was whether reviewers considered a state-wide initiative sufficient to infer the motivation to participate for all included hospitals.34 Multiple site studies often do not provide information on individual facilities35 and studies in low-income countries may have had an initiating body that was not a healthcare delivery organisation36 and reviewers disagreed to which extent they extrapolated from the presented information to individual organisations
Disagreements in the Health Outcome domain were associated with the type of outcome and how systematically data were collected in order to be recognised as a health outcome/data37
Interpretation of criteriaDespite the careful, iterative development of the tool, some disagreements were associated with the interpretation of the scoring criteria. Given the large scope of interventions included in the test set, some ambiguities could not be resolvedIdentified disagreement in the domain Intervention Rationale was associated with publications where only highly selective intervention components were linked to existing empirical literature and reviewers disagreed whether the specific aspect was sufficient to meet the criterion38
Disagreements in the Comparator domain were associated with the question of how much detail was considered sufficient to meet the quality criterion, for example, if only a component of the usual care was described34
Disagreements also occurred when publications described a structural change without information on the uptake, for example, an installation of a comfort room for patients—but whether the room was used in clinical practice was not reported; hence reviewers had to decide whether the intervention was the installation of the room or the use of the room39
  • Examples taken from validation sample (N=54 publications), rater agreement is documented in table 3.

  • Mistakes (omissions) as well as remaining ambiguity (interpretation of reported information and interpretation of criteria) were sources of disagreement between literature reviewers. A qualitative analysis of the disagreements pointed to some systematic, rather than random, reviewer errors.