Most of clinical research studies require data collection from medical records. Manual extraction of the data can produce errors, is time consuming, expensive and require some level of expertise from data collectors. Development and implementation of Electronic Medical Records (EMR) has opened new opportunities for automatic data extraction. In order to validate accuracy of the automatic data extraction we compared it to the same data obtained by manual chart review.
Manually collected data variables from randomly selected 25 patients were compared to the same data obtained through automatic extraction using METRIC Datamart, a Microsoft SQL-based integrative database mirrored the electronic medical records in a near real time. We selected 8 categorical variables (gender, age, invasive ventilation, non-invasive ventilation, vasopressors, transfusions,blood cultures, ICU mortality, hospital mortality and ICU readmissions), and 14 continuous variables (PEEP, TV, SBP, HR, RR, CVP, body temperature, SpO2, Hb, glucose, PaO2, ICU LOS, hospital LOS, duration of mechanical ventilation). Results were compared and presented as percent agreement and Kappa for categorical variables and as correlation coefficient and Bland-Altman mean difference for continues variables.
We observed that almost all manually collected continues variables had 1–3 (4–12%) errors per category (typos in value, date or time). After correction of those errors 10 out of 14 continuous variables had correlation coefficient (CC) of 1 and Bland-Altman mean difference (BAMD) of 0. Data on body temperature showed CC 0.92 and BAMD 0.10, ICU LOS showed CC 0.99 and BAMD 0.43, hospital LOS reach CC 1.0 with BAMD 0.02 and data on duration of mechanical ventilation showed CC 0.98 and BAMD 1.92days. Seven out 8 categorical variables showed 100% agreement with Kappa at 1. Data collected on non-invasive ventilation had 96% agreement and Kappa 0.78.
Automatic extraction of the clinical data from EMR are more reliable, precise and less time consuming compared to manual chart review. However, it requires precise variable definitions and study procedures to avoid data extraction errors.
Automatic extraction of the clinical data from EMR produces accurate and reliable data for research purposes.
Mykola Tsapenko, No Financial Disclosure Information; No Product/Research Disclosure Information