Rationale: Successfully extract, identify, and validate therapeutics lines used in the treatment of Veterans with follicular lymphoma.
Background: With the adoption of electronic health record (EHR) systems, leveraging this data is becoming increasingly important for clinical and observational research, especially in oncology, where precision oncology has become central to the Cancer Moonshot Initiative. One of the greatest challenges in using EHR data is extracting a cancer patient’s treatment history. The difficulty lies in identifying treatment “lines,” which may include one or more drugs, with each drug dispensation often recorded in an unstructured format within the EHR. Our objective was to conceptualize, develop, and validate an algorithm that reconstructs a cancer treatment line history using single-agent EHR pharmacy data in a cohort of follicular lymphoma patients treated in the Veterans Health Administration (VHA).
Methods: The CANCER CARE algorithm recreates and formalizes the heuristic a clinician uses to identify treatment lines dispensed using two inputs: (1) National Comprehensive Cancer Network treatment guidelines and the recommended chemotherapy lines and their comprising
antineoplastic agents that are used in the treatment of the cancer of interest; and (2) Single-agent dispensation information retrieved from the VA Corporate Data Warehouse. The algorithm uses rules to map concordant dispensation agents to a treatment line while taking into account common
practice variations such as omitted agents during the start or middle of a treatment line. It also identifies the initiation of a new line based on a change in agents received or time gaps between treatments. The algorithm was validated by comparing a set of 100 treatment lines that were independently annotated by a clinician in a cohort of patients with follicular lymphoma to the algorithm output. Accuracy, sensitivity, and precision were measured.
Results: CANCER CARE had an accuracy of 96%. Accuracy, sensitivity and precision for most prevalent lines were: 98%, 97% and 100% (rituximab), respectively; and 99%, 100%, and 95% (RCHOP), respectively. Accuracy, sensitivity, and precision for RCVP and BR were all 100%.
Conclusions: Cancer treatment line identification from EHR pharmacy dispensation data using a rule-based approach is feasible with high accuracy and can be used in real-world studies of cancer patient treatment practices and outcomes.