Automatic identification of confusable drug names

doi:10.1016/j.artmed.2005.07.005

Artificial Intelligence in Medicine

Volume 36, Issue 1, January 2006, Pages 29-42

https://doi.org/10.1016/j.artmed.2005.07.005 Get rights and content

Summary

Objective

Methods and material

We propose to address the problem through the application of two new methods—one based on orthographic similarity (“look-alike”), and the other based on phonetic similarity (“sound-alike”). In order to compare the effectiveness of the new methods for identifying confusable drug names with other known similarity measures, we developed a novel evaluation methodology.

Results

We show that the new orthographic measure (BI-SIM) outperforms other commonly used measures of similarity on a set containing both look-alike and sound-alike pairs, and that a new feature-based phonetic approach (ALINE) outperforms orthographic approaches on a test set containing solely sound-alike pairs. However, an approach that combines several different measures achieves the best results on two test sets.

Conclusion

Our system is currently used as the basis of a system developed for the U.S. Food and Drug Administration for detection of confusable drug names.

Introduction

Many hundreds of drugs have names that either look or sound so much alike that doctors, nurses and pharmacists can get them confused, dispensing the wrong one in errors that can injure or even kill patients. In the United States alone, an estimated 1.3 million people are injured each year from medication errors, such as administering the wrong dose or the wrong drug [1]. For example, a patient needed an injection of Narcan but instead got the drug Norcuron and went into cardiac arrest. The U.S. Food and Drug Administration (FDA) has sought to mitigate this threat by ensuring that proposed drug names that are too similar to pre-existing drug names are not approved [2]. This has motivated the research and design of algorithms underlying phonetic orthographic computer analysis (POCA), an operational system implemented by the Project Performance Corporation for the FDA.¹

A number of different lexical similarity measures have been applied to the problem of identifying confusable drug names (henceforth referred to as confusion pairs). For example, 22 distinct methods were tested on a set of drug names extracted from published reports of medication errors [3]. The methods included well-known universal measures, such as edit distance, longest common subsequence, and several variations of measures based on counting common letter n-grams, as well as measures designed specifically for associating phonetically similar names, such as Soundex and Editex. The normalized edit distance, Editex, and a trigram-based measure were identified as the most accurate.

We formulate a general framework for representing word similarity measures based on n-grams, and propose a new measure of orthographic similarity called BI-SIM that combines the advantages of several known measures. We show that this new measure performs better on a U.S. pharmacopeial list of confusable drug names than the measures previously identified as the most accurate by [3].

In addition, we present techniques for detecting drug-name confusions that are attributed solely to high phonetic similarity. Consider the example of Xanax versus Zantac —two brand names that the Physicians’ Desk Reference (PDR) warns may be “mistaken for each other $\dots$ lead[ing] to serious medication errors” [4]. The phonetic transcription of the two names, [zænæks] and [zæntæk], reveals a sound-alike similarity that is not apparent in their orthographic form. For the detection of sound-alike confusion pairs, we apply the ALINE phonetic aligner [5], which estimates the similarity between two phonetically-transcribed words. We demonstrate that ALINE outperforms orthographic approaches on a test set containing sound-alike confusion pairs.

We present a novel method of evaluating the accuracy of a measure, which aims at emulating the perspective of a person involved in the process of approving a new drug name. Our approach is to average recall values for each drug name in the test set. The recall is calculated against a published list of confusable drug names considering only the top k potential confusion pairs returned by a similarity measure. The recall values are then aggregated using the technique of macro-averaging [6].

The next section provides the background for the problem we are addressing, several commonly-used measures of word similarity, and our methodology for evaluation. After this, we present two new methods for identifying look-alike and sound-alike drug names. We then compare the effectiveness of various measures using our recall-based evaluation methodology on a U.S. pharmacopeial list and on another test set containing sound-alike confusion pairs. We conclude with a discussion of our experimental results.

Section snippets

Background

The problem of automatic identification of confusable drug names can be stated as follows: given a large set of existing drug names, identify all pairs or sets of drug names that are potentially confusable with each other. An alternative formulation reflects the process of approving a newly proposed drug name: given a proposed drug name and a large set of existing drug names, identify all drug names in a large set of existing drug names that are potentially confusable with the proposed drug

Orthographic similarity: N-SIM

In this section, we describe the inherent strengths and weaknesses of n-gram and subsequence-based approaches. Next, we present a new, generalized framework, N-SIM, that encompasses a number of commonly used similarity measures. Following this, we describe the parametric settings for BI-SIM—a specific instantiation of this generalized framework which is aimed at combining the advantages of LCSR and BIGRAM.⁷

Phonetic similarity: ALINE

In the preceding section, we proposed a new measure of orthographic similarity for identifying look-alike drug names. However, the detection of sound-alike confusion pairs often requires a different kind of approach. For this purpose, we employ ALINE [5], which computes phonetic similarity between pairs of phonetically-transcribed words. Its underlying principle is the decomposition of phonemes into elementary articulatory phonetic features.¹⁰

Evaluation methodology

We designed a new method for evaluating the accuracy of a similarity measure. Our aim was to emulate the perspective of a person involved in the process of approving a new drug name. Because of the sheer number of pharmaceutical products already in existence, it is very difficult for anyone to think of all possible drug names that may be confused with the newly proposed name. A computer program can facilitate this task by presenting the human expert with a ranked list of potential confusion

Experiments and results

We conducted two experiments with the goal of evaluating the relative accuracy of several measures of similarity in identifying confusable drug names. The first experiment was performed against a list of similar drug names reported to the USP Medication Errors Reporting Program [41] (henceforth the USP set). The USP set is a list of 363 confusion sets (both look-alike and sound-alike), which contain 582 unique drug names. Most of the confusion sets are pairs of names, but some contain three or

Discussion

The results described in Section 6 clearly indicate that BI-SIM and TRI-SIM, the newly proposed measures of orthographic similarity, outperform several currently used measures on the USP (mixed) test set regardless of the choice of the cutoff parameter k. On the sound-alike test set, EDITEX and ALINE are the most effective. However, a simple combination of several measures achieves even higher accuracy, exceeding 90% with only the 15 top pairs considered. It is worth noting that NED does

Conclusion

We have investigated the problem of identifying confusable drug name pairs. The effectiveness of several word similarity measures was evaluated using a new recall-based evaluation methodology. We have proposed a new measure of orthographic similarity that outperforms several commonly used similarity measures when tested on a publicly available list of confusable drug names. On a test set containing solely sound-alike confusion pairs, phonetic approaches, ALINE and EDITEX achieve the best

Acknowledgments

The first author’s research was supported by Natural Sciences and Engineering Research Council of Canada and the second author’s research was supported by the National Science Foundation. In addition, we are indebted to Project Performance Corporation, specifically, Erica Kolatch, Rick Shangraw, and Jessica Toye, for their implementation of our techniques in the POCA system.

References (43)

S. Goldinger et al.
Priming lexical neighbours of spoken words: effects of competition and inhibition
J Mem Lang
(1989)
G. Navarro et al.
Very fast and simple approximate string matching
Inform Process Lett
(1999)
B.L. Lambert et al.
Effect of orthographic and phonological similarity on false recognition of drug names
J Soc Sci Med
(2001)
E. Ukkonen
Approximate string-matching with q-grams and maximal matches
Theoret Comput Sci
(1992)
G.W. Adamson et al.
The use of an association measure based on character structure to identify semantically related pairs of words and document titles
Inform Stor Retrieval
(1974)
T.F. Smith et al.
Identification of common molecular sequences
J Mol Biol
(1981)
J. Lazarou et al.
Incidence of adverse drug reactions in hospitalized patients
J Am Med Assoc
(1998)
M. Meadows
Strategies to reduce medication errors
US Food Drug Admin Consum Mag
(2003)
B.L. Lambert et al.
Similarity as a risk factor in drug-name confusion errors: The look-alike (orthographic) and sound-alike (phonetic) model
Med Care
(1999)
Physicians’ desk reference for nonprescription drugs and dietary supplements. 24th ed. New York, NY: Thomson PDR;...

G. Kondrak

Phonetic alignment and similarity

Comput Human

(2003)

G. Salton

The smart system: experiments in automatic document processing

(1971)

D. Norris

Word recognition: context effects without priming

Cognition

(1987)

R. Cole

Listening for mispronunciations: a measure of what we hear during speech

Percept Psychophys

(1973)

T.A. Harley

The psychology of language: from data to theory

(1995)

K. Church

A stochastic parts program and noun phrase parser for unrestricted text

Dunning T, Statistical identification of language. Technical Report CRL MCCS-94–273. Computing Research Lab, New Mexico...

P. Koehn et al.

Knowledge sources for word-level translation models

G. Kondrak

Identifying cognates by phonetic and semantic similarity

I.D. Melamed

Bitext maps and alignment via pattern recognition

Comp Linguist

(1999)

M. Simard et al.

Using cognates to align sentences in bilingual corpora

Cited by (0)

^☆: A preliminary version of this paper appeared in Proceedings of the 20th International Conference on Computational Linguistics, Geneva (2004) pp. 952–958.

View full text

Automatic identification of confusable drug names☆

Summary

Objective

Methods and material

Results

Conclusion

Introduction

Section snippets

Background

Orthographic similarity: N-SIM

Phonetic similarity: ALINE

Evaluation methodology

Experiments and results

Discussion

Conclusion

Acknowledgments

J Mem Lang

Inform Process Lett

J Soc Sci Med

Theoret Comput Sci

Inform Stor Retrieval

J Mol Biol

Incidence of adverse drug reactions in hospitalized patients

J Am Med Assoc

Strategies to reduce medication errors

US Food Drug Admin Consum Mag

Similarity as a risk factor in drug-name confusion errors: The look-alike (orthographic) and sound-alike (phonetic) model

Med Care

Phonetic alignment and similarity

Comput Human

The smart system: experiments in automatic document processing

Word recognition: context effects without priming

Cognition

Listening for mispronunciations: a measure of what we hear during speech

Percept Psychophys

The psychology of language: from data to theory

A stochastic parts program and noun phrase parser for unrestricted text

Knowledge sources for word-level translation models

Identifying cognates by phonetic and semantic similarity

Bitext maps and alignment via pattern recognition

Comp Linguist

Using cognates to align sentences in bilingual corpora