Conference Coverage

Using Natural Language Processing in Radiology Reports to Identify the Presence of Metastatic Disease in Veterans With Prostate Cancer

Abstract 9: 2017 AVAHO Meeting


 

Background: Radiographic imaging is important for the diagnosis and management of cancer. Radiology reports contain a wealth of information, but are typically formatted as unstructured text, making large scale information extraction challenging. We validated a natural language processing (NLP) algorithm to identify the presence of metastatic disease in radiographic imaging reports.

Methods: Using VA Clinical Cancer Registry and Corporate Data Warehouse, we identified approximately 3 million radiology reports for 120,374 patients receiving care for prostate cancer in the VA from 2006-2015. We focused on the impression section of CT, PET/CT, X-ray, bone scan, and MRI reports. We expanded on Chapman et al. “ConText” algorithm to identify the presence of metastatic disease: (1) Using UMLS, we identified terms compatible with “metastasis”; (2) Report impressions were preprocessed and tokenized at the sentence level and as part of the sentence; (3) Positive and negative trigger phrases were implemented as a series of regular expressions, which were refined over a number of iterations using training data from 2 batches of 600 reports, allowing us to extend trigger identification to a larger set of phrases. The final algorithm was validated using an independent sample of 2,000 reports annotated by a domain expert.

Results: The first training set of 600 of radiology reports achieved an accuracy of: 94% for reports with no mention of metastasis, 85% for negated mention of metastasis, and 74% mentions of metastasis without negation. Errors were reviewed resulting in vocabulary expansion and improved implementation of regular expressions to capture the expanded trigger phrases. Performance of the modified algorithm was tested on a new set of 600 reports and resulted in an increased accuracy of 96% for no mention of metastasis, 90% for negated mention of metastasis, and 89% mentions of metastasis without negation. After additional modifications were made, the revised algorithm was validated using an independent sample of 2,000 reports. The accuracy was 96% (Cohen’s kappa ~1), with precision of 98%, and a sensitivity of 98%.

Conclusions: Detecting presence of metastatic disease from radiographic notes is feasible with NLP.

References: (1) Sarkar S, Das S. A review of imaging methods for prostate cancer detection. Biomed Eng Comput Biol. 2016;7(Suppl 1):1-15. (2) Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform. 2001;34(5):301- 310. (3) Harkema H, Dowling JN, Thornblade T. Con-Text: An algorithm for determining negation, experiencer, and temporal status from clinical reports. J Biomed Inform. 2009;42(5):839-851.

Recommended Reading

Getting a Better Picture of Skin Cancer
AVAHO
A New ‘Triplet’ Treatment for Multiple Myeloma
AVAHO
Women Living Longer With Metastatic Breast Cancer
AVAHO
The Prognostic Value of Circulating Plasma Cells in Multiple Myeloma
AVAHO
Secondary Cancers After Prostate Cancer: What’s the Risk?
AVAHO
Less lenalidomide may be more in frail elderly multiple myeloma patients
AVAHO
Characterization of Hematology Consults for Complete Blood Count Abnormalities: A Single Center Experience in the Era of Electronic Consultation
AVAHO
Recurrent Cisplatin Hypersensitivity Reaction After First Exposure: A Case Report
AVAHO
Failure Patterns After Stereotactic Body Radiation Therapy for Early Non-Small Cell Lung Cancer and Their Implications for Future Management
AVAHO
Comparison of PFT Before and After Radiation Therapy for Lung Cancer
AVAHO