PURPOSE: While lung nodules are commonly encountered in pulmonary practice, relatively little is known about their management in community settings. A validated, automated method to identify patients with lung nodules based on information contained in electronic administrative records and diagnostic radiology reports would greatly facilitate research in this area.
METHODS: Using a population-based sample of all active members of a large, integrated health system in 2006-2010, we employed an iterative process to develop an automated method to identify patients with lung nodules, using information contained in electronic administrative files and diagnostic radiology reports. First, we used combinations of diagnostic (ICD-9) and procedure (CPT) codes to identify and pull dictated radiology reports of potential interest. Then, we developed and refined a natural language processing (NLP) algorithm to perform free text searches and identify reports from patients with lung nodules measuring up to 30 mm in diameter. A random sample of 115 radiology reports, stratified by ICD-9 code and year, was reviewed by an expert clinician, providing a reference standard to evaluate the accuracy of the NLP algorithm.
RESULTS: A combination of five diagnostic codes, four procedure codes, and the final NLP algorithm identified 7,171 unique patients with at least one radiology report indicating one or more incident lung nodules. Compared to judgment by an expert clinician, the sensitivity and specificity of the NLP algorithm were 95% and 87%, respectively. Among all individuals with lung nodules that we identified, the mean (SD) age was 65 (14) years; 14% were under 50 years old. There were slightly more women than men (54%), and 15% were current smokers. Almost 20% were Hispanic, 55% were non-Hispanic white, 10% were non-Hispanic black, and 8% were Asian/Pacific Islander. Fourteen percent were subsequently diagnosed with lung cancer.
CONCLUSIONS: A combination of diagnostic codes, procedure codes and an NLP algorithm for free text searching of radiology reports can accurately and efficiently identify patients with incident lung nodules, many of whom are subsequently diagnosed with lung cancer.
CLINICAL IMPLICATIONS: Automated identification of patients with lung nodules will facilitate research and help identify optimal strategies for management.
DISCLOSURE: Michael Gould: University grant monies: Grant support from NIH/NCI to study lung cancer diagnosis and staging
The following authors have nothing to disclose: Kim Danforth, Megan Early, Anne Kosco, Chengyi Zheng
No Product/Research Disclosure Information