Article Text

Download PDFPDF

Directory of clinical databases: improving and promoting their use
  1. N Black,
  2. M Payne,
  3. on behalf of the DoCDat Development Group
  1. Department of Public Health & Policy, London School of Hygiene & Tropical Medicine, London WC1E 7HT, UK
  1. Correspondence to:
 Professor Nick Black, Professor of Health Services Research, Department of Public Health & Policy, London School of Hygiene & Tropical Medicine, Keppel Street, London WC1E 7HT, UK;{at}


Background: The controversy surrounding the actual and potential use of clinical databases partly reflects the huge variation in their content and quality. In addition, use of existing clinical databases is severely limited by a lack of knowledge of their availability.

Objectives: To develop and test a standardised method for assessing the quality (completeness and accuracy) of clinical databases and to establish a web based directory of databases in the UK.

Methods: An expert group was set up (1) to establish the criteria for inclusion of databases; (2) to develop a quality assessment instrument with high content validity, based on epidemiological theory; (3) to test empirically, modify, and retest the acceptability to database custodians, face validity and floor/ceiling effects; and (4) to design a website.

Results: Criteria for inclusion of databases were the provision of individual level data; inclusion in the database defined by a common circumstance (e.g. condition, treatment), an administrative arrangement, or an adverse outcome; and inclusion of data from more than one provider. A quality assessment instrument consisting of 10 items (four on coverage, six on reliability and validity) was developed and shown to have good face and content validity, no floor/ceiling effects, and to be acceptable to database custodians. A website ( was developed. Indications over the first 18 months (number of visitors to the site) are that it is increasingly popular. By November 2002 there were around 3500 hits a month.

Conclusions: A website now exists where visitors can identify clinical databases in the UK that may be suitable to meet their aims. It is planned both to develop a local version for use within a hospital and to encourage similar national systems in other countries.

  • clinical audit
  • clinical governance
  • database quality
  • world wide web

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

The need for high quality clinical databases has been thoroughly documented.1–3 Briefly, they offer the opportunity to carry out evaluative research and clinical audit, to inform the planning and management of services, and provide individual clinicians with accurate estimates of the outcome of their care (assuming an accurate prognostic model exists which can be shared with prospective patients).

Despite these potential benefits, clinical databases have generally had few supporters and have attracted considerable scepticism and criticism. Some of this has been justified by the existence of some poor quality clinical databases and by the overambitious and inappropriate uses to which such databases have sometimes been put.4,5 While some advocates of clinical databases have exaggerated their usefulness, some critics have based their beliefs on their familiarity with a few poor quality examples or on an impractical adherence to purity, preferring no information to data that may not reach their desired level of perfection.

Much of the confusion that exists regarding the value of clinical databases arises from a tendency to treat them all alike. As with all forms of information or methods of enquiry, both good and bad examples exist. Some clinical databases are of high quality and can be used for sophisticated evaluative research—for example, determining the clinical outcome of premature discharge from intensive care6—while others are limited to providing basic information that allows only descriptive comparisons of levels of activity (such as, in England, Hospital Episode Statistics7). Failure to appreciate such diversity has led both to the underuse and the misuse of databases.

It is not surprising that a consequence of the claims and counterclaims about the value of clinical databases is that those judging the appropriateness of their use are, at best, confused and, at worst, apprehensive or even dismissive of them. They include research funders, journal editors, health services managers, politicians, and journalists.

In an attempt to promote both the quality of clinical databases and their appropriate use, we have tried to create a website in which visitors can find out what does exist (initially restricted to the UK) and be provided with an independent assessment of their scope and quality. To enable us to achieve the latter we first had to develop and test a standardised method for assessing clinical databases. While similar exercises have been conducted for other methodologies such as randomised trials,8 previous work in the area of databases has been restricted to identifying the relevant dimensions rather than producing an operational instrument for practical use.1,9–11 Our objectives were therefore to develop and test a standardised method for assessing the quality of clinical databases and to establish a web based directory of databases in the UK.


There were four phases to the work:

  • establishment of criteria for inclusion of clinical databases;

  • development of a checklist for assessing the quality of clinical databases;

  • empirical testing of the checklist; and

  • design of a website.

The establishment of criteria to determine which clinical databases would be included was undertaken by an expert group comprising health services epidemiologists and statisticians, clinicians (from intensive care, nephrology, cardiac surgery, obstetrics and gynaecology, haematology, general surgery, spinal injuries), research funders, and information specialists (see Acknowledgements).

The development of the quality assessment checklist was based on the approach used to develop CONSORT, the checklist assessing the quality of randomised trials.8 Consensus was sought through a series of meetings of the expert group during 1999 and 2000. Agreement within this diverse group would be an indication of the content validity of the draft instrument. The checklist had to cover all methodological threats to the reliability, internal validity, and generalisability of a database. It also had to be straightforward enough to be of practical value and applicable to a wide variety of databases.

During spring 2001 the instrument was tested by a Masters level graduate in epidemiology on a sample of 20 clinical databases selected to represent the range of those available. Our objectives were to assess the face validity, its acceptability to database custodians, and the existence of floor and ceiling effects resulting from our definitions of the four levels of each item. Following modifications the instrument was retested by the same person reapplying it to the 20 databases.

Finally, a website was designed that described each clinical database and its management and provided an independent assessment of its quality (using the checklist).


Scope of databases included

The scope of the databases that would be eligible for inclusion was restricted to those which included information on the recipients of health care. We excluded databases in which information was limited to the provision of resources or services such as a register of hospital bed provision, useful though such data are in studying and managing health services. We included databases—based on retrospectively and prospectively collected data—that met the following criteria:

  • provision of individual level data (whether or not users of the database are permitted to know the identity of individuals);

  • inclusion in the database is defined by a common circumstance (for example, individual’s condition, intervention required or undergone (which might be a diagnostic test, treatment, or a collection of interventions such as intensive care)), by an administrative arrangement (for example, registered with a general practitioner, target for immunisation, subscriber to health insurance), or by an adverse outcome (such as maternal death);

  • inclusion of data from more than one provider of health care (usually many providers in a region or country).

As our concern was health services, we excluded databases of cohorts defined by an environmental exposure, although such data are useful for aetiological research.

Methodological considerations for assessing quality

Two main methodological concerns were identified by the expert group when selecting a database—its coverage and its accuracy.

A potential user needs to know about four aspects of coverage to determine whether or not a database is suitable for the proposed task:

  • how representative it is of the country as a whole;

  • the extent to which all eligible individuals have been included;

  • the extent of the dataset on each individual; and

  • the completeness of data collection.

Similarly, accuracy can be defined in terms of six aspects:

  • the form in which quantitative data are collected;

  • the use of explicit definitions of variables;

  • the use of explicit rules for data collection;

  • reliability of data coding;

  • the independence of observations of outcomes; and

  • the extent to which data are validated.

For each of these 10 items we defined four possible levels of attainment: level 1 describes the simplest and least rigorous, level 4 the most sophisticated and most rigorous.

Empirical testing and modification

The draft instrument was tested by interviewing the custodians of 20 databases (representing a range of areas of health care and geographical coverage). As a result, the face validity was improved. Several terms had to be clarified such as the distinction between immediate and long term outcomes. The attempt to distinguish between mandatory and optional variables included in the database had to be abandoned in the face of lack of clarity on the part of database custodians. For some items the boundaries between the four levels had to be altered to increase the sensitivity. Only one item displayed floor and ceiling effects—the reliability of coding of conditions and interventions was either “not tested” or, if tested, was “good”.

The modified instrument (table 1) was retested by reapplying to the same 20 databases and was found to be less ambiguous and therefore easier to administer. In general, the spread of results was better although the mean level for each item was largely unchanged. To ensure improvements in face validity were maintained, an instruction manual was written containing clear definitions of all terms and explicit rules for determining the level of each item.

Table 1

Criteria for assessing the coverage and accuracy of a clinical database

Creation of the website

The website ( was successfully created with database users’ needs in mind. It allows visitors to search for and identify databases that may be suitable for their need, whether that be evaluative research, clinical audit, supporting shared decision making models, or strategic planning of services. A facility exists to search on the basis of one or more of: body system; pathogenesis; health care intervention; age group; and geographical area. Each database entry is designed to achieve three objectives: (1) to inform potential users as to the scope and management of a database (inclusion criteria, geographical area and time period covered, dataset, management of security and confidentiality, outputs); (2) how it can be accessed (contact details of custodian); and (3) its methodological quality (using the new instrument). All of this information has to be obtained by a trained interviewer to ensure an independent assessment is obtained.

The information provided on the coverage and accuracy of the identified databases will enable an assessment to be made as to their suitability—for example, evaluative research might require information on confounders to be included in a database to allow adequate risk adjustment of clinical outcomes. In contrast, a simpler database would be adequate to compare the rates of use of a particular intervention in different populations. Examples of possible applications are shown in box 1. The website enables a user to compare potentially relevant databases by means of “at a glance” histograms (fig 1).

Figure 1

Histograms comparing the quality of data in four diabetes databases.

Box 1 Examples of potential uses of DoCDat


  • It is uncertain how effective coronary artery bypass grafting is in elderly patients as none were included in the original randomised trials. Identification of a large clinical database that allows risk adjustment for confounders would enable outcomes in different age groups to be ascertained.

  • The equity of use of many clinical services is unknown. Concern that men and women may not experience the same opportunity to be admitted to intensive care could be investigated if a database of all admissions to a large number of intensive care units was identified.

  • Little is known about the impact on clinical outcomes of different ways of organising and delivering services. If similar databases from different healthcare systems could be identified, meaningful comparisons of outcomes could be made which would help increase our understanding of the best way to organise services.


  • The staff of a day surgery unit who want to gauge the quality of their services could do so if they could identify suitable databases containing information based on the performance of other comparable units.

  • To overcome the clinical limitations of routine administrative data, purchasers could identify databases that provide comparative information on the performance of providers in those areas that are their priorities.

Clinical decision making

  • For patients to be involved in making informed decisions about their own care, there is a need for decision analytical models that consider the probabilities of each possible outcome in that healthcare system. To construct such software, developers need to identify large databases which can provide those probabilities—for example, a database of people with diabetes could provide information on the likelihood of emergency hospital admissions that might result from a change in treatment.

Planning services

  • Attempts to reconfigure expensive specialist services such as intensive care could benefit from data on all existing admissions (and their outcomes) in a geographical region. The impact of merging two ICUs or introducing a new one could be determined by modelling data if a suitable database could be identified.

DoCDat only provides an overview of each clinical database. Clearly, despite careful enquiries by the assessor, the validity of the information reported on the website is dependent on the knowledge and views of the custodian of the database. A warning to this effect is included, together with a request for users of DoCDat to let us know of any concerns they have about the accuracy of our assessments. If users wish to delve deeper, it is necessary to find out more from the database custodian whose contact details are provided in the DoCDat entry.

The only feasible means of judging the initial impact of the website is by a process measure—the frequency of visitors to the site. The site was launched in July 2001. Although there has been little effort to market the site, by November 2002 there were around 3500 visits a month.


An instrument for describing the quality of a clinical database was successfully developed. For the first time a practical standardised method exists for summarising database quality and this, in turn, enables meaningful comparisons to be made. In addition, a website has been established that provides a simple free means for potential database users to search for and select suitable sources to meet their needs.

The number of databases included in DoCDat will be expanded over the next few years. Following the deliberate policy of initially including databases from a wide variety of clinical settings during the initial development phase, we have focused on areas of national priority in the UK commencing with coronary heart disease and cancer. By February 2003 over 90 databases had been included (see table 2 on the QSHC website In parallel, it is hoped to include databases from outside the UK, following interest from other countries. Alternatively, people may wish to collaborate and establish DoCDat in their own country. An international network could then be established to link the websites.

While adding more databases is the priority, it is also essential to update and maintain all the entries. This is being done by requesting information of changes from database custodians as they are instituted and by an annual enquiry initiated by DoCDat staff.

Key messages

  • Clinical databases offer excellent opportunities to evaluate and audit health care.

  • Numerous clinical databases exist but the level of detail, completeness, and accuracy varies.

  • The limited use that has been made of them partly reflects the difficulty potential users face in identifying the databases that exist and their methodological quality.

  • An instrument for assessing and reporting the quality of clinical databases has been developed.

  • A website ( has been established that reports the databases that exist in the UK and their quality. Extension to include other countries is planned.

A further development underway is to modify the website for local use (so called “Local DoCDat”). Hospital managers are often aware that their clinicians have established their own clinical databases but are unsure of the extent, content, and nature of such activity. Given the costs involved and increasing concern about data confidentiality,12 a central directory of all the clinical databases in a hospital might prove very helpful for managers.

DoCDat is being evaluated in three ways. Firstly, the level of use is being monitored using the number of website “hits”. This somewhat limited information is being supplemented by asking visitors to register their interest by recording their e-mail address. While this will not be comprehensive (as registration is voluntary), it will provide some insight into the background of users. Secondly, visitors are invited to comment and make suggestions as to how DoCDat might be improved. Thirdly, ad hoc surveys will be conducted of registered users, database custodians, relevant statutory bodies (such as the Commission for Health Improvement and the Department of Health), and those involved in funding, assessing and publishing research and audit results.

While enabling greater access and use of existing clinical databases is the immediate aim of DoCDat, its other aim is to improve their quality. Our experience suggests that some database custodians have rather limited knowledge and understanding of the methodological issues relating to database quality. DoCDat aims to advise, where appropriate, how quality can be improved. This is being facilitated by establishing a network of database custodians and putting them in contact with one another to enable practical experiences to be shared.


We thank the members of the expert Development Group: Nick Black (convenor), Colin Sanderson, Jeremy Wyatt, Kathy Rowan, Barney Reeves, Andy Padkin, James Raftery, Jan van der Meulen, Jane Thomas, David Ansell, Azim Lakhani, Sean Boyle, Imogen Evans, Paul Roderick, Alice McCleod, Stephen Proctor, Bruce Keogh, Dajue Wang, Maurice Ward, Peter McCulloch, Gina Radford, and Jack Williams for their help; the custodians of 20 databases for their cooperation; the Nuffield Trust for funding the initial development; and the National Centre for Health Outcomes Development for funding (2001–2004).



  • See editorial commentary, pp 327–8

Linked Articles