Original article
Effect of applying different “levels of evidence” criteria on conclusions of Cochrane reviews of interventions for low back pain

https://doi.org/10.1016/S0895-4356(02)00498-5Get rights and content

Abstract

The objective of this study was to examine the consistency of conclusions of Cochrane systematic reviews when different criteria are used to determine levels of evidence. We reanalyzed the data in six Cochrane reviews of conservative treatment of low back pain by applying three additional sets of “levels of evidence” criteria. Overall agreement between the conclusions attained with the different levels of evidence criteria was only “fair” (multirater kappa coefficient 0.33; 95% CI 0.28 to 0.38). For example, the four sets of levels of evidence criteria produced four conclusions on the efficacy of back school: “strong evidence that back schools are effective,” “weak evidence,” “limited evidence,” and “no evidence.” Pairwise agreement between the four pooling systems ranged from slight to substantial (kappas ranging from 0.10 to 0.80). Different rules for determining levels of evidence in systematic reviews produce markedly different conclusions on treatment efficacy. Crown

Introduction

Low back pain is a large problem in most industrialised countries. Although the condition is associated with enormous social and economic costs there is not yet agreement on the most effective way to manage the condition. One factor contributing to this uncertainty is the contradiction in results from studies that have evaluated interventions for low back pain.

Many commentators argue that the strongest evidence for a therapy is provided by a systematic review of all relevant randomised controlled trials (RCTs) [1]. Systematic reviews provide a method to efficiently synthesize the results of individual studies. A single review may synthesize the results of many RCTs and reduce these down to a series of brief summary statements on the efficacy of the various therapies. For example, a review by Van Tulder et al. [2] reduced the findings of 12 RCTs to a single summary statement: “There is strong evidence that exercise therapy is not more effective for acute low back pain than inactive or other active treatments with which it has been compared” (p. 2784).

This qualitative approach to synthesis, sometimes called the “levels of evidence” approach, has been used in a large number of systematic reviews of therapy. Typically, in the conclusion of these reviews, one of four levels of evidence (“strong,” “moderate,” “limited,” or “conflicting,” and “no evidence”) is used to describe the strength of evidence for a therapy. These brief summaries are appealing because they provide a conclusion that is easy to recall. Researchers have justified the levels of evidence approach on the grounds that it provides a conclusion that incorporates both the outcomes and quality of included studies [3]. The approach has also been advocated as a method to pool results of individual RCTs when heterogeneity precludes quantitative pooling [4].

Although the levels of evidence approach to synthesis has become popular in systematic reviews, the criteria used to judge the level of evidence have not yet been standardized. Different authors of systematic reviews employ different criteria 2, 3, 5, and the same author may use different criteria in different studies 2, 6. Differences in criteria usually involve methodologic decisions such as whether or not to include nonrandomized trials [5], or how to weight low-quality studies 2, 3. However, although the rules for pooling vary across studies, very similar narrative descriptors are used to describe the results of pooling.

This lack of standardization in levels of evidence pooling rules means that there is the potential for different conclusions to be drawn from the same set of RCTs. If different pooling rules produced different conclusions on the efficacy of a therapy, serious questions would be raised about the validity of the pooling rules. If two systems produce markedly different conclusions from the same RCTs, both systems cannot be valid.

A number of systematic reviews from the Cochrane Collaboration Back Review Group have used a levels of evidence approach to pooling. There is a belief that Cochrane reviews are of higher methodologic quality than non-Cochrane reviews, and there is some data to support this belief [7]. The objective of the present study was to examine the consistency of conclusions of Cochrane systematic reviews of conservative treatment of low back pain when different levels of evidence criteria are applied.

Section snippets

Data extraction

Cochrane reviews of conservative treatment of LBP published in the period January 2000 – May 2001 were retrieved from the Cochrane database. Reviews were included in the study if the author had used a “levels of evidence” approach to pool results of individual RCTs.

We then inspected each of these Cochrane reviews and identified each treatment comparison for which pooling had been undertaken and recorded the conclusion on efficacy. For example, Van Tulder et al. [8] concluded that, for the

Results

Six Cochrane systematic reviews fulfilled the inclusion criteria. These systematic reviews included the following conservative interventions for LBP: exercise [2], acupunture [4], back school [13], lumbar supports [6], cognitive behavioral therapy [8], and massage [3]. From these six reviews a total of 60 treatment comparisons were identified.

The generalized kappa coefficient for the agreement between the four sets of levels of evidence criteria was 0.33 (95% CI = 0.28–0.38) indicating

Discussion

The results of the present study show that the overall agreement between the four sets of levels of evidence criteria was fair. Correcting the kappa values for an attenuation due to imperfect reliability improved our estimates of agreement between schemes but not nearly enough for clinicians and policy makers to have confidence that the systems measure level of evidence reliably. This means that there is potential for markedly different conclusions from systematic reviews if different sets of

Conclusion

Different levels of evidence criteria produce different conclusions on treatment efficacy. We advise readers to be cautious when interpreting conclusions of systematic reviews that use a levels of evidence approach.

References (15)

  • D.A.W.M. Van der Windt et al.

    Ultrasound therapy for musculoskeletal disordersa systematic review

    Pain

    (1999)
  • National Health and Medical Research Council

    How to use the evidenceassessment and application of scientific evidence

    (2000)
  • M. Van Tulder et al.

    Exercise therapy for low back paina systematic review within the framework of the Cochrane Collaboration Back Review Group

    Spine

    (2000)
  • A.D. Furlan et al.

    Massage for low back pain (Cochrane review). Issue 4

    (2000)
  • M.W. Van Tulder et al.

    Acupunture for low back pain (Cochrane review). Issue 3

    (2000)
  • M.N.M. Van Poppel et al.

    A systematic review of controlled clinical trials on the prevention of back pain in industry

    Occup Environ Med

    (1997)
  • M.W. Van Tulder et al.

    Lumbar supports for prevention and treatment of low back pain (Cochrane review). Issue 3

    (2000)
There are more references available in the full text version of this article.

Cited by (60)

  • Aquatic exercise & balneotherapy in musculoskeletal conditions

    2012, Best Practice and Research: Clinical Rheumatology
    Citation Excerpt :

    All three papers used similar methodology with comparable selection criteria. None of the articles reported data from statistical pooling, rather the authors provided a conclusion based on a level of evidence synthesis [22]. This has been an acceptable way of analysis for a period when statistical pooling was not possible due to heterogeneity concerning patient population interventions and outcomes.

  • Challenges in guideline methodology

    2011, Journal of Clinical Epidemiology
  • A Systematic, Critical Review of Manual Palpation for Identifying Myofascial Trigger Points: Evidence and Clinical Significance

    2008, Archives of Physical Medicine and Rehabilitation
    Citation Excerpt :

    We were aware that level of evidence (LOE) criteria could influence the results of this systematic review.25 Given the variability in trigger point reporting, we opted for what Ferreira et al25 termed a moderate LOE strategy.20 Strong evidence required generally consistent findings in more than 1 high-quality study, moderate evidence required generally consistent findings in 1 high-quality study and 1 or more low-quality studies, or in multiple low-quality studies and insufficient evidence when only 1 study was available or inconsistent findings were observed in multiple studies.

View all citing articles on Scopus
View full text