Understanding receiver operating characteristic (ROC) curves
Pedagogical Tools and Methods
Jerome Fan, MD; Suneel Upadhye, MD, MSc; Andrew Worster, MD, MSc
Division of Emergency Medicine, McMaster University, Hamilton, Ont.
CJEM 2006;8(1):19-20
The term "receiver operating characteristic" came from tests of the ability of World War II radar operators to determine whether a blip on the radar screen represented an object (signal) or noise. The science of "signal detection theory" was later applied to diagnostic medicine.2 The determination of an "ideal" cut-off value is almost always a trade-off between sensitivity (true positives) and specificity (true negatives). As both change with each "cut-off" value it becomes difficult for the reader to imagine which cut-off is ideal. The ROC curve offers a graphical illustration of these trade-offs at each "cut-off" for any diagnostic test that uses a continuous variable.3 Ideally, the best "cut-off" value provides both the highest sensitivity and the highest specificity, easily located on the ROC curve by finding the highest point on the vertical axis and the furthest to the left on the horizontal axis (upper left corner) (Fig. 1). However, it is rare that this ideal can be achieved, so that, for example, one may opt to choose a higher sensitivity at the cost of lower specificity. In the NSE study,1 the authors chose a cut-off point of >30 μg/L with a specificity of 100% and sensitivity of 79% (Fig. 2). A cut-off point with high specificity allows the authors to "rule-in" the outcome for all patients with a NSE value above the selected cutoff.4 The study indicates that patients with a NSE level >30 μg/L will die before hospital discharge and those with a NSE level <29 μg/L will possibly survive to hospital discharge.
The area under the ROC curve (AUC) is widely recognized as the measure of a diagnotic test's discriminatory power.5 The maximum value for the AUC is 1.0, thereby indicating a (theoretically) perfect test (i.e., 100% sensitive and 100% specific). An AUC value of 0.5 indicates no discriminative value (i.e., 50% sensitive and 50% specific) and is represented by a straight, diagonal line extending from the lower left corner to the upper right (Fig. 3). There are several scales for AUC value interpretation but, in general, ROC curves with an AUC £0.75 are not clinically useful and an AUC of 0.97 has a very high clinical value, correlating with likelihood ratios of approximately 10 and 0.1. The AUC for NSE was 0.87, demonstrating moderate discriminatory power and, therefore potential utility as a diagnostic test in determining the non-survivors of a cardiac arrest with return of spontaneous circulation.

Fig. 1. Receiver operating characteristic curve illustrating high discriminatory power.

Fig. 2. Receiver operating characteristic curve for the overall performance of neuron-specific enolase to predict survival at 48 hours after return of spontaneous circulation.

Fig. 3. Receiver operating characteristic curve illustrating no discriminatory power.
It is important to note that ROC performance may change when the diagnostic test is applied to different clinical situations (e.g., patient populations) or under different phases of test development (derivation, validation). The most useful information from a diagnostic test likely originates by pooling the results of several studies examining the same test in different situations, generating averaged specificity, sensitivity and ROC, so as to be able to get a true understanding of the diagnostic test's utility.6
In summary, ROC analysis provides important information about diagnostic test performance: the closer the apex of the curve toward the upper left corner, the greater the discriminatory ability of the test (i.e., the true-positive rate is high and the false-positive [1 - Specificity] rate is low). This is measured quantitatively by the AUC such that a value of >0.96 indicates excellent discriminatory ability. Like all summary measures, however, there are confidence intervals around this value that must be taken into consideration. In the end, it will be rare for a diagnostic test to have both 100% specificity and sensitivity. The clinician will have to decide which cut-off value will provide the likelihood ratios and sensitivity and specificity values that have the greatest clinical value in the diagnosis of any disorder.
Dr. Andrew Worster, Emergency Department, Hamilton Health Sciences, McMaster University Medical Centre, 1200 Main St. W, Hamilton ON L8N 3Z5
Search
Downloads
-
81.07 KB
