Article Text

Download PDFPDF

Shuffle the deck, flip that coin: randomization comes to medicine
  1. D Neuhauser,
  2. M Diaz
  1. Department of Epidemiology and Biostatistics, Case School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
  1. Correspondence to:
 M Diaz PhD
 Department of Epidemiology and Biostatistics, Case School of Medicine, Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH 44106, USA;

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

There is a lot of confusion about the first use of randomization in an experiment and the first use of randomization in medicine. Some writers think that the great statistician R A Fisher carried out the first randomized trial (in agriculture)1 and published the results in 1923.2–5 Others think that the first medical trial was carried out by the British National Research Council in the 1940s.6,7 But there are earlier examples and we think we have found the oldest.

Readers of this series will know that controlled trials are as old as Biblical Daniel,8 the evaluation of smallpox inoculation in 1721,9 the evaluation of Mesmerism in 1784,10 and Semmelweiss’s trials to reduce hospital infections.11 To resolve this debate we have to be precise in our definition of randomization in experiments.

The word “randomization” does not exist in the Oxford English Dictionary or its supplement.12,13 It has two frequent uses in social science and clinical research. The first is a “random sample” from a larger population. Here randomization is used to ensure that the sample is representative of the population as a whole. The second use of randomization in experimental design is our focus here. There is an experiment, purposefully and prospectively designed; there are control and experimental interventions on patients purposefully designed to eliminate every observable and unobservable difference between the two except for the experimental intervention; and there is a method of assignment which is out of the control of any human and their potential biases. This is done by flipping a coin, shuffling a deck of cards, drawing lots, the use of tables of (pseudo) random numbers or today by computer generated random numbers. The study is carried out and the results of the intervention are evaluated by comparing the differences.

Those readers who wish to argue against our “firsts” may be able to find older studies using our definition or they can use another definition and argue for earlier studies. Here are the oldest two studies fitting this definition known to us.


Charles Peirce (1839–1914) graduated at the bottom of his class at Harvard (79th out of 90) in 1859 and went on to make major scientific contributions in astronomy, geodetics, psychology, and philosophy (as the founder of “pragmatism”). He suffered from repeated bouts of depression and his worldly career was not successful.14 In 1879 he was appointed instructor at Johns Hopkins where he carried out the following trial with his colleague Joseph Jastrow.15

The question addressed was this. Blindfolded people can easily tell the difference between a 1 kg and 2 kg weight. Can a blinded person tell the difference between 1 kg and 1 kg + 2 g? Plus 1 g? What is the difference that is too small to detect? It was a simple question, but very hard to design the experiment to find the “just noticeable difference”.

The blinded subject sat on one side of a barrier and saw only the end of the balance of a scale. On the other side of the barrier the investigator would add or remove gram weights to the kilogram weight on the scale. This was done randomly using a specially designed deck of cards so the investigator would not impose his own unknown biases in the order of the tests, thereby creating a detectable pattern. The subject let go of the balance. The gram weight was added or subtracted depending on the color of the card drawn from the deck. The subject would press on the balance and say whether the weight was more or less, and his degree of confidence in his choice (0, 1, 2, or 3). Fifty such tests were run at a time and the cards were notched and used as a rapid way of recording the results.

The statistician Stephen Stigler calls this one of the best examples of a carefully developed and explained psychological experiment.16 The purposeful use of random assignment of experimental changes created a known baseline of error (50/50) which could be used to compare with the actual results.

Stigler considers Peirce to be one of the two greatest American scientific minds of the 19th century. Peirce’s study was published in 1885 but this did not stop him from being fired from John Hopkins the next year. The reasons are not clear, but, seemingly, were in part due to his divorcing his abused first wife and remarrying. His cocaine addiction (unproved) probably did not help. It must be said that opiates were legally dispensed at the time.14

Today you can visit the Peirce Edition Project website17 and join the Peirce International Society of Scholars. The 150th anniversary conference proceedings discuss his contributions to philosophy and make no reference to his first knowledgeable, explicit use of randomizing experimental interventions.18


Where in the world did the first randomized trial in medicine occur? If you ask your friends this question, you have a safe bet they will not know. The answer is Detroit, Michigan at the Detroit Municipal Tuberculosis Sanatorium which no longer exists. Its site is now a public park and no monument marks this contribution to clinical evidence.

J Burns Amberson (1890–1979) was on the staff of this hospital in 1926–1927 when he carried out a clinical trial of a drug for tuberculosis (sodium gold-theosulfate or sanocrysin). He had moved to Bellevue Hospital in New York City when it was published in 1931 in the American Review of Tuberculosis.19

To quote from this paper: “On the basis of clinical, X-ray and laboratory findings the 24 patients were then divided into two approximately comparable groups of 12 each. The cases were individually matched one with another in making this division. Obviously, the matching could not be precise but it was as close as possible, each patient having been studied independently by two of us. Then, by a flip of the coin, one group became identified as group I (sanocrysin treated) and the other as group II (control). The members of the separate groups were known only to the nurse in charge of the ward and the two of us. The patients themselves were not aware of any distinction in the treatment administered.”

Today we would not have “flipped the coin” once to divide these groups in two but would use a table of random numbers 24 times to assign each individual to one of the groups. Amberson’s group found no difference in outcomes and this treatment was abandoned.

Unlike Peirce, Amberson in his lifetime gained all the honors his field had to offer: Professor at Columbia University Medical School and head of the Chest Service at Bellevue with devoted, excellent resident physicians; President of the National Tuberculosis Association in 1942; winner of the Trudeau Medal in 1952; an honorary degree from the University of Pennsylvania in 1953; and over 70 publications.20–22 He lived to see the end of the sanatorium era of tuberculosis treatment, due to the introduction of streptomycin which led to ambulatory treatment7 and the hope that tuberculosis could be eradicated. Since his death, HIV/AIDS and drug resistant tuberculosis have dashed this hope for now.

Peirce is now honored and Amberson nearly forgotten, but they are heroes for their explicit, purposeful introduction of randomization to human experimentation.


The authors acknowledge the help of Preeti Rout MD, Steven Nosack, Lucinda Klein, and Dzwinka Holian.