Pros and cons of assessment scales
The advantages of using a scale are due to the manner in which patients experience depressive symptoms, along a continuum of mild to severe. A scale is able to represent these gradations in severity and may be helpful in guiding the need for treatment and treatment adjustments.
Unfortunately, this ability to measure the dimensional nature of depression is also a weakness, as a threshold must be identified above which the patient is classified as warranting further investigation. Ideally, these thresholds should be established in a representative primary care sample and predict functional status as well as likelihood of meeting DSM-IV diagnostic criteria. The ability of a scale to accurately identify patients in need of attention depends directly on the threshold.
Pros and cons of symptom counts
Instruments based on depression criteria are a relatively new innovation, appearing since the establishment of DSM-IV criteria that define reference symptoms, a minimum number of which must be present to diagnose depression. Depression criteria–based instruments have the advantage of not being dependent on a threshold of symptom severity.
However, in primary care settings this can also be a weakness because the presence of depression criteria alone may not be a reliable indicator of depression-related impairment.17 Instruments that can be used in both a diagnostic criteria and scale modes have a particular advantage in that the weaknesses of each are offset.
Characteristics of selected screening instruments
We searched MEDLINE and the Cochrane databases for reviews of depression screening, with particular attention to reviews of primary care-based trials. Forty-one papers emerged, 3 of which were systematic reviews. For this paper, we focused on the review published by Williams and colleagues,18 which summarizes primary care data on the depression screening instruments most widely used. They examined 379 studies that compared the primary care performance of these instruments with a reference standard diagnostic interview, such as the Structured Clinical Interview for DSM-IV (SCID).19 Twenty-eight studies met their criteria and were included in the systematic review.
In Table 2 we have adapted the information from Williams’s review and added a calculation of PPV based on a 10% prevalence estimate for depression in primary care populations. We chose to exclude information on the Single Question (SQ) screen because of its very low PPV and the Hopkins Symptom Checklist (HSCL) because of its length (25 questions). In addition, we chose to add the Hospital Anxiety and Depression Scale (HADS), using operating characteristic information from 2 studies,20,21 because of its purported advantages in medically ill populations.
Beyond the SQ, it is useful to comment on “2-question screening” as suggested by the USPSTF. We are unable to find justification for this in the paper by Pingone and colleagues, which served as background for the recommendations.10 Although Pingone et al did cite the report of Wells and colleagues as using a 2-item screener, their study used not only 2 questions on mood and anhedonia but also other criteria in screening their population.22 Therefore, it is not appropriate as a source for 2-item screening performance characteristics.
Comparison of the operating characteristics of the selected instruments reveals that most yield PPV values in the 20% to 30% range, with the exception of the HADS, the PHQ, and the PHQ-9, which yield PPV values of 41.3%, 50%, and 55%, respectively.
The PHQ-9 (included in the (Appendix) offers a further advantage over the HADS and other instruments listed in that within a 9-item instrument both the presence of diagnostic criteria and severity may be assessed. Kroenke and colleagues have examined the use of the PHQ-9 as a severity instrument and found it to be a reliable and valid measure of depression severity when compared with the Medical Outcomes Study Short Form (SF-20).23
We purposely have not examined negative predictive values (NPV) for the listed instruments. NPV is useful when screening using biomedical markers where a negative result allows extrapolation into the future due to a known, predictable time course for development of the screened-for condition. For example, a negative screening colonoscopy has value not just because of its current predictive value, but because we know something about how long it may take to develop precancerous polyps in a negative screened patient. However, this is not the case with depression. A patient that fails to meet criteria for depression today could fully meet criteria in 2 weeks and be quite depressed. Therefore we have chosen to focus on PPV in comparing depression screening instruments.
Selection and use of a screening instrument