Abstract
Purpose
To evaluate the methodological quality of randomized controlled trials (RCTs) published in Intensive Care Medicine from 2001 to 2010, and to compare it with a previous review of RCTs published from 1975 to 2000.
Methods
We assessed the quality of reporting of randomization, blinding and participant flow, both individually and combined within the Jadad scale, and compared them with findings from our previous review. For RCTs published from 2001 to 2010, we also evaluated the frequency of distorted finding presentation (spin) and inflated predicted treatment effect (delta inflation).
Results
In the 221 RCTs from 2001 to 2010, the sample size was significantly larger than in the older series, and there was a higher proportion of studies with negative findings. Reporting of the rationale for sample size estimation and allocation concealment increased significantly, but reporting of other important individual methodological components did not change substantially compared with the previous period and remained low. Among RCTs from 2001 to 2010, a spin strategy was used in 69 of 111 RCTs with statistically negative results, while delta inflation was present in 7 of 11 RCTs evaluating survival as a primary outcome. Papers with higher Jadad scores were cited more often than the others.
Conclusions
Quality of reporting of RCTs published in Intensive Care Medicine has only partly improved over time, and spin and delta bias are of frequent occurrence. There is a need for stronger adherence to CONSORT recommendations, with special emphasis on accurate description of randomization and blindness, and correct reporting of statistically non-significant results.
Similar content being viewed by others
Introduction
The randomized controlled trial (RCT) is considered the highest level of evidence available for evaluating new therapies. Results of adequately powered RCTs are more definitive than any other type of clinical research information. As such, RCTs represent one of the most reliable sources of evidence to guide clinical practice. However, the methodological quality of a RCT can influence the validity, accuracy and reproducibility of its results [1]. Flaws in the methodological quality of a RCT have been associated with biased estimates of treatment effect and efficacy [2–7]. Methodological quality is defined as “the confidence that the trial design, conduct, and analysis have minimized or avoided biases in its treatment comparisons” [1], whereas reporting quality is defined as “the provided information about the design, conduct and analysis of the trial” [1]. Inadequate reporting makes the interpretation of studies difficult or impossible. Since the quality of a RCT can be judged only based on what has been reported, quality of reporting has been used as a measure of methodological quality. Quality is judged inadequate unless the information to the contrary is reported (“guilty until proven innocent” approach), with this being often justified by the fact that faulty reporting generally reflects faulty methods [8]. Inadequate or inaccurate reporting is common among medical journals. Deficiencies have been documented in reporting the method used to randomly assign participants to comparison groups, analyze the data and ensure blinding of outcome evaluation, and in reporting primary and secondary endpoints and sample size calculation. However, the quality of reporting may not necessarily reflect the methodological quality of the study, because well-conducted trials may be reported badly [9–12].
Specific scales, such as the Jadad scale, have been developed to evaluate the methodological quality of clinical trials [9]. Presence and appropriateness of randomization, blinding and reporting of withdrawals, which are key indicators of the quality of RCTs [8], are all included in the Jadad scale [9]. However, other features not included in the Jadad scale can influence the quality of RCTs, including correct sample size calculation, allocation concealment and intention-to-treat analysis. Therefore, although summary scales provide a useful synthetic representation of RCT quality, all relevant methodological components should also be individually evaluated.
A sample size calculation is crucially important to guarantee that the study has adequate power to detect the treatment effect and minimize the risk of false negative findings, a major concern in clinical research. Sample size calculations need to be carefully performed since incorrect calculations can be misleading, and a new indicator of possible bias, referred to as “delta inflation”, has recently been proposed to assess this [13]. Specification of the expected frequency of the outcome in each study group is an important step for sample size calculations of RCTs, since this defines the clinically relevant and scientifically plausible treatment effect targeted by the study. Delta is the predicted effect size of the treatment under study compared to the control treatment on a pre-specified outcome, chosen as the one of greatest importance to relevant stakeholders. The other elements of the sample size calculation are the significance level required for the rejection of the null hypothesis and the statistical power. The delta inflation represents an overestimation of the expected treatment effect size [13]. Compared to misspecification of the other variables, the delta inflation has a larger impact on the required sample size, and is of common occurrence in RCTs investigating therapies for critical illness published in high impact journals [13]. Delta inflation may result in RCTs that have inadequate sample size to detect genuine differences between the investigated treatments, which can lead to false negative findings.
Another indicator of possible bias recently proposed is the “spin strategy”, defined as the use of specific reporting strategies that distort the interpretation of results and misguide readers [14]. Spin strategies include [14]: (1) focusing on secondary statistically significant results, such as statistically significant results from within-group comparisons, analyses of secondary outcomes, subgroup analyses, or modified analyses (e.g., per protocol analysis); (2) interpreting statistically non-significant results as demonstrating treatment equivalence or comparable effectiveness when the study had not been designed to assess equivalence or non-inferiority, designs that require a different statistical approach and larger sample sizes compared with classical superiority RCTs [15]; and (3) claiming or emphasizing the beneficial effect of the treatment despite statistically non-significant results.
In a previous review assessing the quality of reporting of RCTs published in Intensive Care Medicine, from its birth in 1975 to December 2000, the percentage of adequately reported RCTs according to the Jadad scale was only 25 % [16]. Intensive Care Medicine is nowadays recognized as one of the leading journals in the intensive care medicine field with a well-defined identity [17]. Its articles are widely cited in the medical literature, and its impact factor has risen since 2001 to rank second among intensive care journals.
The aim of this study was to compare the quality of reporting of RCTs published in Intensive Care Medicine from 2001 to 2010 with that described in our previous review for RCTs published from 1975 to 2000, using individual components of methodological quality of reporting as well as the Jadad scale.
For RCTs published from 2001 to 2010, we also evaluated the frequency of spin and delta inflation as further indicators of methodological quality. Finally, we tested the hypothesis that RCTs with higher Jadad scores are cited more often than those with lower scores.
Methods
In line with our previous study [16], this review includes all published RCTs that evaluated the efficacy of a treatment. RCTs evaluating diagnostic, management or educational strategies were excluded. Studies were identified by two independent assessors consulting the on-line archive of the Journal and Springer’s website, using the following search terms in the article’s title and abstract: “randomized controlled trial”, “controlled clinical trial”, “randomized”, “trial”, “randomly assigned”, “random order”, “randomization”, “placebo”, “drug therapy”.
Two independent reviewers assessed the studies using a standardized form, and discrepancies were resolved by discussion with a third reviewer until consensus was reached.
Assessment of individual methodological components
The quality of reporting of RCTs was assessed evaluating three major methodological components: randomization process, blinding, and reporting of the participant flow. Key analyzed elements of the randomization process were the description of the method used to generate an unpredictable sequence and its concealment until assignment (e.g., a computerized random number generation of the sequence and its concealment in sealed, opaque, sequentially numbered or coded envelopes). Blinding was analyzed in terms of strategy used to withhold information about the assigned interventions and to protect the randomization sequence after allocation. Explicit statements about the blinding status of the patients and study personnel involved in the RCT, such as clinicians, researchers, statisticians, or outcome assessors, were recorded. Double blinding of ICU personnel and patient was judged as not feasible (i.e. supine vs. prone positioning, different ventilator modalities, and use of devices) [18] based on the assessment of type of intervention, made by two expert intensivists (F.A.R., N.L.). Finally, the key analyzed elements of participant flow were the reporting of the number of patients randomly assigned to a treatment and those who actually received the intended treatment, the number of patients analyzed for the primary outcome, and the number of patients excluded after randomization or lost to follow-up.
The Jadad scale
We used the Jadad scale for a synthetic representation of RCT quality and for comparison with the previous study period. The scale consists of five yes/no questions assessing three key items: randomization (two questions: 1. Was the study described as randomized? 2. Was the randomization scheme described and appropriate?), double blinding (two questions: 3. Was the study described as double-blind? 4. Was the method of double blinding described and appropriate?), and dropouts and withdrawals (one question: 5. Was there a description of dropouts and withdrawals?) [9]. Total scores range from 0 to 5, with scores ≥3 indicating good quality RCTs [8]. We maintained the distinction between RCTs with a Jadad score <3 and those with a score of ≥3 to allow comparison of the overall methodological quality between the two study periods.
Analysis of spin and delta inflation
The RCTs reporting statistically non-significant results were examined for the presence of spin, and information was extracted on the specific spin strategy used by the authors [14].
We evaluated delta inflation only in RCTs with mortality as primary outcome for consistency with the original publication [13], with delta representing the treatment effect (difference in mortality between treatment and control group). The difference between predicted and observed delta was defined as delta-gap. Delta inflation was considered present if the predicted delta was outside the 95% confidence interval of the observed delta.
Other information extracted
The following information was also extracted: sex and age of participants; industry support (classified as total industry funding, in-kind contribution from industry or duality of interest) [19]; presence of parallel groups (yes/no); number of intervention groups (two or more); characteristics of the control group (placebo or active treatment); number of patients included; blinding status (with specification of who were blinded); pre-specified primary outcome; “a priori” calculation of the sample size; type of outcome considered (mortality or specific outcomes). Outcomes were further classified as objectively or subjectively assessed, according to the extent to which outcome assessment could be influenced by the investigators’ judgment. Objectively assessed outcomes included all-cause mortality, outcomes based on a laboratory measurement (e.g., pH, PaO2, cardiac index) and outcomes based on other objective measures (e.g., duration of ICU or hospital stay). Subjectively assessed outcomes included physician assessed outcomes (e.g., ventilator-associated pneumonia, acute respiratory distress syndrome), outcomes based on a combination of several measures (e.g., multiple hemodynamic or respiratory parameters), patient reported outcomes (e.g., post-traumatic stress disorder-related symptoms, pain scoring) [12].
We also obtained the total cumulative citation counts of each paper included in our review using three different sources, Web of Science (Thomson Reuters. ISI Web of Knowledge Web site. http://www.isiwebofknowledge.com), Scopus (Elsevier. Scopus Web site. http://www.scopus.com) and Google Scholar (Google. Google Scholar beta Web site. http://scholar.google.com), to test the hypothesis that RCTs with Jadad score ≥3 are cited more often than RCTs with lower scores. In August 2012, two of us (S.P., C.M.) independently determined the total number of citations to date for all articles according to the Web of Science’s Science Citation Index, Scopus, and Google Scholar using the Digital Object Identifier (DOI) to uniquely identify each article. No discrepancies were found in the number of retrieved citation counts retrieved by the two investigators. The maximum difference in time between assessments of any of the 3 databases was 7 days for all articles.
Data presentation and statistical analysis
We expressed continuous variables as means (standard deviation, SD) or medians (interquartile range, IQR) and discrete variables as counts (percentage), unless otherwise stated. Differences between groups were analyzed by means of a Student’s t test, Mann–Whitney U test, and chi square test (or Fisher exact test), as appropriate. The presence of a time trend in the use of spin strategies was investigated by logistic regression testing the association of spin (any strategy) with year of publication. The association between number of citations and RCT quality was tested using Poisson regression with robust standard error, and the model was adjusted by year of publication (categorical variable with 5 levels, corresponding to two years each over the 10-year period).
Tests were two-tailed, and P ≤ 0.05 was considered as significant. The data were analyzed with STATA 9.0 (Stata, College Station, TX, USA).
Results
From January 2001 to December 2010, 233 RCTs were published in Intensive Care Medicine, of which 221 (95 %) were included in the analysis (Fig. 1; supplemental e-Table). The design characteristics are reported in Table 1. Mortality was the primary outcome in 17 RCTs (8 %). The mean number of RCTs published yearly was 22 (range 14–30), significantly higher than in the previous period between 1975 and 2000 (22 vs. 9; t test: P < 0.001). Sample size was also significantly larger than in the previous period (median 42, IQR 20–100; absolute range 5–1,101 versus 30, 20–64; Mann–Whitney test: P = 0.048). Yet, one-third of RCTs had 20 patients or fewer and 10 % had 10 patients or fewer.
The reporting of the individual methodological components is presented in Table 2, where findings are compared with those from our previous review. Studies with statistically non-significant results were more common than in the previous period (52 vs. 17 %; χ 2 test: P < 0.001). Reporting of the rationale for sample size estimation and allocation concealment increased significantly, but reporting of other important individual methodological components did not change substantially compared with the previous period, and remained low, varying from 12 % for the description of the method used to ensure blinding to 57 % for description of withdrawals.
Among 69 RCTs (31 %) reporting blinding, 4 studies reported triple blinding (patient, researcher and assessor blinded in 3 studies; patient, researcher and statistician blinded in 1 study), 32 reported double blinding (patient and researcher blinded in 25 studies; researcher and statistician blinded in 3 studies; outcome assessor and statistician blinded in 1 study; double blinding no further specified in 3 studies), and 33 single blinding (blinding of the patient in 11 studies, researcher in 16 studies, data analyst in 5 studies, and outcome assessor in 1 study) (supplemental e-Figure).
Among 152 RCTs (69 %) not reporting blinding, 81 did not report the primary outcome; of the 71 that reported it, the primary outcome was objectively assessed in 42 (mortality: 14; other outcomes: 28). Double blinding of the ICU personnel and patient was judged as not feasible in 110 (supplemental e-Figure).
Among the 151 RCTs not reporting allocation concealment, 83 did not report the primary outcome; of the 68 RCTs that reported it, the primary outcome was objectively assessed in 30 (mortality: 5; other outcomes: 25).
Among RCTs published between 2001 and 2010, the proportion of studies with Jadad score ≥3 was not significantly higher than in RCTs published between 1975 and 2000 (30 vs. 26 %; χ 2 test: P = 0.40), and it increased only slightly (37 %) after exclusion of RCTs where double blinding of ICU personnel and patient was judged as not feasible. Among RCTs with double blinding not feasible, blinding of data analyst was reported in only one study.
Spin strategy was evaluated among 111 RCTs (50 %) published in the period 2001–2010 that reported statistically non-significant result. A spin strategy was used in 69 (62 %): 43 interpreted statistically non-significant results for the primary outcomes as a demonstration of treatment equivalence or comparable effectiveness, 21 RCTs focused on secondary statistically significant results, and 5 claimed or emphasized the beneficial effect of the treatment despite statistically non-significant results. A logistic regression analysis showed no association between presence of spin and year of publication (P = 0.35).
Delta inflation was evaluated in 11 RCTs published in the period 2001–2010 that had survival as a primary outcome and reported both predicted and observed delta. Figure 2 shows evidence of delta inflation in 7 RCTs (64 %), where the predicted deltas are consistently higher and outside the 95 % CI of the observed deltas.
Among RCTs published in the period 2001–2010, the number of citations was higher for RCTs with Jadad score ≥3 compared with those with Jadad score <3. The Poisson regression model adjusted for year of publication showed an increase in citations associated with a Jadad score ≥3 of 32 % (95 % CI: 1–71 %; P = 0.04 for Web of Science, 31 % (1–69 %; P = 0.04) for Scopus, and 32 % (2–71 %; P = 0.04) for Google Scholar. We found no relationship between the presence of spin and the number of citations in Web of Science, Scopus, or Google Scholar (P values of 0.42, 0.50, and 0.42, respectively).
Discussion
We analyzed the quality of reporting of RCTs published in Intensive Care Medicine from 2001 to 2010, and compared it with a previous analysis of RCTs published in the Journal from 1975 to 2000 [16]. The total number of RCTs increased significantly in the last 10 years compared with the previous 25 years. However, we could not appreciate a similar trend for the quality of reporting. Reporting of the rationale for sample size estimation and allocation concealment increased significantly, but other important quality indicators such as randomization, blinding, and specification of primary outcome were reported in one-third to one-half of published RCTs with no substantial differences with previous period. The sample size increased significantly, yet one-third of RCTs had 20 patients or fewer and 10 % had 10 patients or fewer. We also documented distorted presentation of results and inflated size of treatment effect in a considerable number of RCTs published from 2001 to 2010.
Our results are in agreement with findings from previous studies that found low rates of reporting of important indicators of methodological quality in RCTs published in various specialty and general medical journals [20–28]. A recent Cochrane review described the inadequate reporting of essential elements of the methodological quality of published RCTs as a serious endemic problem hindering research utilization in clinical practice and further research [29]. RCTs are the gold standard in evaluating health care interventions, but rigorous methodology is of crucial importance to ensure unbiased comparisons [2]. To assess a trial accurately, readers of a published report need complete, clear, and transparent information on its methodology and findings. Lack of adequate reporting has promoted the development of the CONSORT (Consolidated Standards of Reporting Trials) statement in 1996 and its later revisions [30], and an increased publication of reviews aimed to assess the quality of the reporting of published RCTs [31]. Our findings show that quality of reporting of RCTs published in Intensive Care Medicine in the “after-CONSORT” period did not differ substantially compared to the earlier period. Similar results have been described in other general and specialty journals [28, 31], suggesting that specific measures are needed to increase the adoption of the CONSORT recommendations by authors, reviewers, and editors [17, 32].
RCTs with inadequate or unclear random-sequence generation, inadequate or unclear allocation concealment, or lack of or unclear blinding tend to exaggerate estimates of treatment effects, especially if assessing subjective outcomes [4, 6, 7, 12]. In our analysis, among RCTs not reporting allocation concealment or blinding that had a primary outcome specified, more than half used subjectively assessed outcomes, making the risk of inflated estimates of treatment effect likely. As an important remark, blinding to treatment of the care givers and patients may sometimes be not feasible in RCTs exploring health care interventions in critical care medicine. In such cases, an uncritical evaluation of the quality of reporting may generate over-pessimistic estimates of methodological quality. However, blinding of the outcome assessors, data collectors, or data analysts is always possible, and it is crucial in order to ensure unbiased ascertainment of the outcome and unbiased treatment effect estimates [33]. Lack of blinding can introduce bias if knowledge of the treatment received affects patient care or outcome assessment [12]. Blinding in RCTs can reduce bias, particularly those with subjective outcomes [34]. Therefore, “blinding of as many individuals as is practically possible” should always be done [33]. Among the 110 RCTs where double blinding of ICU personnel and patient was not feasible, only one study reported blinding of the data analyst. In such cases, it would also be desirable to specify at least one objectively assessed outcome, even if the outcome of primary interest is subjective [12].
Reporting of the sample size calculation has greatly increased in the past decades, from 4 % in 1980 to 83 % in 2002 [35–37]. In a recent review of general medical journals with high impact factors, only 5 % did not report any sample size calculation [35]. Despite this, calculations were frequently based on inaccurate assumptions about the control group and were often erroneous [35]. A priori calculation of sample size is intended to provide a sample size that is large enough to detect a postulated size of treatment effect with reasonable confidence [38]. We found that methods and assumptions used for sample size estimation were reported in 43 % of RCTs published between 2001 and 2010. Although sample sizes were significantly larger compared with those in the period 1975–2000, they remained small. As expected, overestimation of the event rate in the treatment group, a major determinant of sample size calculation, was commonplace; in fact, we documented an inflated predicted treatment effect in 7 of 11 RCTs (64 %) reporting both predicted and observed treatment effects on mortality. Aberegg et al. [13] found similar proportions (68 %) in 38 RCTs published in high impact factor journals evaluating the effect of therapies on critical care mortality. Underpowered trials are often viewed as a major problem in clinical research. RCTs that are too small can be misleading, either by missing realistically moderate treatment effects that would be clinically important [38], or by over-estimating the size of treatment effect and finding it statistically significant purely due to chance [8, 38]. Moreover, neglecting to report sample size calculations suggests methodological weakness [39, 40]. In constrast, others view underpowered trials as a potential resource, since they may still convey relevant information that can be incorporated into systematic reviews and meta-analyses, provided that bias is avoided and reporting is exhaustive [4, 9, 35, 38–41].
In assessing the quality of RCTs, we relied mostly on the analysis of individual components of methodological quality rather than on the Jadad scale. This scale is the only one that has been developed using established standards, and low scores have been associated with increased effect estimates [42]. However, despite its thorough development and validation, the scale is problematic for several reasons, including the fact that it penalizes research areas where blinding to treatment may be not feasible, such as critical care medicine or surgery [42]. In our study, RCTs with double blinding not feasible received an over-pessimistic evaluation using the Jadad scale, which should caution against the use of this scale as a single method to synthesize methodological quality of reporting of RCTs. With this limitation in mind, we found no improvement over time in terms of Jadad scores.
A spin strategy was used in a considerable proportion of RCTs with statistically non-significant primary outcomes published in the period 2001–2010, with no changes over time. Reasons for spin are currently speculative, possibly including the pursuit of personal needs or corporate economic interests, which may increase public distrust in science [43]. Authors might use spin strategies simply to increase the chance of publication, but our findings suggest that this may be futile as well as wrong. In fact, we observed a significant increase over time in the proportion of RCTs with statistically non-significant findings published in Intensive Care Medicine, which is in line with current recommendations for journal publication policies [44]. Sometimes, however, it may be difficult to distinguish spin from genuine mistakes in interpreting studies with “negative results”. For example, a common error in medical literature is the interpretation of absence of evidence as evidence of absence [45], which might explain the interpretation of non-significant findings as demonstration of treatment equivalence by the authors of some of the RCTs. Indeed, establishing therapeutic equivalence between treatments of proven efficacy requires RCTs specifically designed, with predefined equivalence margins [46]. Although the specific impact of spin on the interpretation of RCT findings by peer reviewers and readers is unclear, “the fairness of results reporting” has recently been shown to play an essential role in physicians’ evaluation of a trial validity, with influence on their willingness to believe and act on the trial findings [47, 48].
Intensive Care Medicine’s impact factor more than doubled over the last decade, and yet the quality of reporting of its RCTs did not improve substantially. The impact factor is often used as an indicator of the quality of science published in a journal, but in fact it depends on a number of elements that go beyond scientific aspects [49, 50], including the role of industry-supported research [51–53] and editorial strategies that influence its calculation (e.g., reduction of citable articles) [53, 54]. Evidence on the quality of published articles and their methodological weaknesses is therefore important for ensuring a journal’s improvement over time through targeted actions, such as introduction or maintenance of specific methodological requirements for authors and reviewers.
Citation of scientific articles by other researchers is an important indicator of the dissemination of research findings, it reflects the impact of the article within the scientific community. However, whether the number of citations reflects the methodological quality of a paper has been questioned, because other factors may have a higher impact on the frequency of citation, including the journal reputation [55–57], the country where the research was done [58], or the finding of positive results [56, 57] may count more than the design merits of the study. We found that better reporting was associated with a statistically significant increase of about a third in all three databases considered, Web of Science, Scopus, and Google Scholar.
In conclusion, our current analysis reveals that quality of reporting of RCTs published in the Intensive Care Medicine between 2001 and 2010 has not substantially improved compared to previous years. Adherence to CONSORT recommendations, with special emphasis on accurate description of randomization and blindness, and correct reporting and discussion of results in RCTs with statistically non-significant results, are recommended. Improved quality of methodological reporting may help to select articles with greater impact on science, and may have beneficial effects on Journal citation.
References
Moher D, Jadad AR, Nichol G, Penman M, Tugwell P, Walsh S (1995) Assessing the quality of randomized controlled trials: an annotated bibliography of scales and checklists. Control Clin Trials 16:62–73
Schulz KF, Chalmers I, Hayes RJ, Altman DG (1995) Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA 273:408–412
Moher D, Pham B, Jones A, Cook DJ, Jadad AR, Moher M, Tugwell P, Klassen TP (1998) Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses? Lancet 352:609–613
Altman DG, Schulz KF, Moher D, Egger M, Davidoff F, Elbourne D, Gotzsche PC, Lang T (2001) The revised CONSORT statement for reporting randomized trials: explanation and elaboration. Ann Intern Med 134:663–694
Kjaergard LL, Villumsen J, Gluud C (2001) Reported methodologic quality and discrepancies between large and small randomized trials in meta-analyses. Ann Intern Med 135:982–989
Hrobjartsson A, Thomsen AS, Emanuelsson F, Tendal B, Hilden J, Boutron I, Ravaud P, Brorson S (2012) Observer bias in randomised clinical trials with binary outcomes: systematic review of trials with both blinded and non-blinded outcome assessors. BMJ 344:e1119
Savovic J, Jones HE, Altman DG, Harris RJ, Juni P, Pildal J, Als-Nielsen B, Balk EM, Gluud C, Gluud LL, JP AI, Schulz KF, Beynon R, Welton NJ, Wood L, Moher D, Deeks JJ, Sterne JA (2012) Influence of reported study design characteristics on intervention effect estimates from randomized, controlled trials. Ann Intern Med
Jüni P, Altman DG, Egger M (2008) Assessing the quality of randomised controlled trialssystematic reviews in health care. BMJ, London, pp 87–108
Jadad AR, Moore RA, Carroll D, Jenkinson C, Reynolds DJ, Gavaghan DJ, McQuay HJ (1996) Assessing the quality of reports of randomized clinical trials: is blinding necessary? Control Clin Trials 17:1–12
Altman DG (2001) Systematic reviews of evaluations of prognostic variables. BMJ 323:224–228
Huwiler-Muntener K, Juni P, Junker C, Egger M (2002) Quality of reporting of randomized trials as a measure of methodologic quality. JAMA 287:2801–2804
Wood L, Egger M, Gluud LL, Schulz KF, Juni P, Altman DG, Gluud C, Martin RM, Wood AJ, Sterne JA (2008) Empirical evidence of bias in treatment effect estimates in controlled trials with different interventions and outcomes: meta-epidemiological study. BMJ 336:601–605
Aberegg SK, Richards DR, O’Brien JM (2010) Delta inflation: a bias in the design of randomized controlled trials in critical care medicine. Crit Care 14:R77
Boutron I, Dutton S, Ravaud P, Altman DG (2010) Reporting and interpretation of randomized controlled trials with statistically nonsignificant results for primary outcomes. JAMA 303:2058–2064
Piaggio G, Elbourne DR, Pocock SJ, Evans SJ, Altman DG (2012) Reporting of noninferiority and equivalence randomized trials: extension of the CONSORT 2010 statement. JAMA 308:2594–2604
Latronico N, Botteri M, Minelli C, Zanotti C, Bertolini G, Candiani A (2002) Quality of reporting of randomised controlled trials in the intensive care literature. A systematic analysis of papers published in Intensive Care Medicine over 26 years. Intensive Care Med 28:1316–1323
Azoulay E, Citerio G, Timsit JF (2013) The identity of Intensive Care Medicine. Intensive Care Med 39:343–344
Sud S, Sud M, Friedrich JO, Meade MO, Ferguson ND, Wunsch H, Adhikari NK (2010) High frequency oscillation in patients with acute lung injury and acute respiratory distress syndrome (ARDS): systematic review and meta-analysis. BMJ 340:c2327
Felder TM, Palmer NR, Lal LS, Mullen PD (2011) What is the evidence for pharmaceutical patient assistance programs? A systematic review. J Health Care Poor Underserved 22:24–49
Poolman RW, Struijs PA, Krips R, Sierevelt IN, Lutz KH, Bhandari M (2006) Does a “Level I Evidence” rating imply high quality of reporting in orthopaedic randomised controlled trials? BMC Med Res Methodol 6:44
Lai TY, Wong VW, Lam RF, Cheng AC, Lam DS, Leung GM (2007) Quality of reporting of key methodological items of randomized controlled trials in clinical ophthalmic journals. Ophthalmic Epidemiol 14:390–398
Sut N, Senocak M, Uysal O, Koksalan H (2008) Assessing the quality of randomized controlled trials from two leading cancer journals using the CONSORT statement. Hematol Oncol Stem Cell Ther 1:38–43
Bai Y, Gao J, Zou DW, Li ZS (2009) Methodological reporting of randomized clinical trials in major gastroenterology and hepatology journals in 2006. Hepatology 49:2108–2112
Danilla S, Wasiak J, Searle S, Arriagada C, Pedreros C, Cleland H, Spinks A (2009) Methodological quality of randomised controlled trials in burns care. A systematic review. Burns 35:956–961
Hopewell S, Dutton S, Yu LM, Chan AW, Altman DG (2010) The quality of reports of randomised trials in 2000 and 2006: comparative study of articles indexed in PubMed. BMJ 340:c723
Strech D, Soltmann B, Weikert B, Bauer M, Pfennig A (2011) Quality of reporting of randomized controlled trials of pharmacologic treatment of bipolar disorders: a systematic review. J Clin Psychiatr 72:1214–1221
Agha RA, Camm CF, Edison E, Orgill DP (2012) The methodological quality of randomized controlled trials in plastic surgery needs improvement: a systematic review. J Plast Reconstr Aesthet Surg
Mills EJ, Wu P, Gagnier J, Devereaux PJ (2005) The quality of randomized trial reporting in leading medical journals since the revised CONSORT statement. Contemp Clin Trials 26:480–487
Turner L, Shamseer L, Altman DG, Weeks L, Peters J, Kober T, Dias S, Schulz KF, Plint AC, Moher D (2012) Consolidated standards of reporting trials (CONSORT) and the completeness of reporting of randomised controlled trials (RCTs) published in medical journals. Cochrane Database Syst Rev 11:MR000030
Schulz KF, Altman DG, Moher D (2010) CONSORT 2010 statement: updated guidelines for reporting parallel group randomized trials. Ann Intern Med 152:726–732
Dechartres A, Charles P, Hopewell S, Ravaud P, Altman DG (2011) Reviews assessing the quality or the reporting of randomized controlled trials are increasing over time but raised questions about how quality is assessed. J Clin Epidemiol 64:136–144
Hirst A, Altman DG (2012) Are peer reviewers encouraged to use reporting guidelines? A survey of 116 health research journals. PLoS ONE 7:e35621
Karanicolas PJ, Farrokhyar F, Bhandari M (2010) Practical tips for surgical research: blinding: who, what, when, why, how? Can J Surg 53:345–348
Schulz KF, Altman DG, Moher D, Fergusson D (2010) CONSORT 2010 changes and testing blindness in RCTs. Lancet 375:1144–1146
Charles P, Giraudeau B, Dechartres A, Baron G, Ravaud P (2009) Reporting of sample size calculation in randomised controlled trials: review. BMJ 338:b1732
Moher D, Fortin P, Jadad AR, Juni P, Klassen T, Le Lorier J, Liberati A, Linde K, Penna A (1996) Completeness of reporting of trials published in languages other than English: implications for conduct and reporting of systematic reviews. Lancet 347:363–366
Schulz KF, Grimes DA (2002) Allocation concealment in randomised trials: defending against deciphering. Lancet 359:614–618
Guyatt GH, Mills EJ, Elbourne D (2008) In the era of systematic reviews, does the size of an individual trial still matter. PLoS Med 5:e4
Schulz KF, Grimes DA (2005) Multiplicity in randomised trials II: subgroup and interim analyses. Lancet 365:1657–1661
Schulz KF, Grimes DA (2005) Multiplicity in randomised trials I: endpoints and treatments. Lancet 365:1591–1595
Schulz KF, Grimes DA (2005) Sample size calculations in randomised trials: mandatory and mystical. Lancet 365:1348–1353
Lundh A, Gotzsche PC (2008) Recommendations by Cochrane Review Groups for assessment of the risk of bias in studies. BMC Med Res Methodol 8:22
Hyman M (2010) Science for sale: protect yourself from medical research deception. http://www.huffingtonpost.com/dr-mark-hyman/dangerous-spin-doctors-7-_b_747325.html. Accessed 28 June 2012
International Committee of Medical Journal Editors (2013) Uniform requirements for manuscripts submitted to biomedical journals: publishing and editorial issues related to publication in biomedical journals: obligation to publish negative studies. http://www.icmje.org/publishing_1negative.html. Accessed 2 April 2013
Altman DG, Bland JM (1995) Absence of evidence is not evidence of absence. BMJ 311:485
Powers JH (2008) Noninferiority and equivalence trials: deciphering ‘similarity’ of medical interventions. Stat Med 27:343–352
Drazen JM (2012) Believe the data. N Engl J Med 367:1152–1153
Kesselheim AS, Robertson CT, Myers JA, Rose SL, Gillet VBA, Ross KM, Glynn RJ, Joffe S, Avorn J (2012) A randomized study of how physicians interpret research funding disclosures. N Engl J Med 367:1119–1127
(2005) In praise of soft science. Nature 435:1003
Bonati MR, Drusini AG (1996) Morgagni and the impact factor. Nature 381:271–271
Smith R (2003) Medical journals and pharmaceutical companies: uneasy bedfellows. BMJ 326:1202–1205
Smith R (2005) Medical journals are an extension of the marketing arm of pharmaceutical companies. PLoS Med 2:e138
Lundh A, Barbateskovic M, Hrobjartsson A, Gotzsche PC (2010) Conflicts of interest at medical journals: the influence of industry-supported randomised trials on journal impact factors and revenue—cohort study. PLoS Med 7:e1000354
McVeigh ME, Mann SJ (2009) The journal impact factor denominator: defining citable (counted) items. JAMA 302:1107–1109
Callaham M, Wears RL, Weber E (2002) Journal prestige, publication bias, and other characteristics associated with citation of published studies in peer-reviewed journals. JAMA 287:2847–2850
Nieminen P, Carpenter J, Rucker G, Schumacher M (2006) The relationship between quality of research and citation frequency. BMC Med Res Methodol 6:42
Etter JF, Stapleton J (2009) Citations to trials of nicotine replacement therapy were biased toward positive results and high-impact-factor journals. J Clin Epidemiol 62:831–837
Filion KB, Pless IB (2008) Factors related to the frequency of citation of epidemiologic publications. Epidemiol Perspect Innov EP+I 5:3
Acknowledgments
We thank Prof. Massimo Antonelli, former Editor-in-Chief, and Prof. Jordi Mancebo, former Reviews Deputy Editor of Intensive Care Medicine, who suggested the investigation into the association between methodological quality of randomized controlled trials and frequency of citation.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Latronico, N., Metelli, M., Turin, M. et al. Quality of reporting of randomized controlled trials published in Intensive Care Medicine from 2001 to 2010. Intensive Care Med 39, 1386–1395 (2013). https://doi.org/10.1007/s00134-013-2947-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00134-013-2947-3