COVID-19 Outbreak Surveillance Through Text Mining Applied to EHRs

The COVID-19 pandemic has caused significant disruptions to everyday life and has had social, political, and financial consequences that will persist for years, say Rocha and Solha, et al. (2024). Several initiatives with intensive use of technology were quickly developed in this scenario; however, technologies that enhance epidemiological surveillance in contexts with low testing capacity and healthcare resources are scarce, they add. This study aimed to address this gap by developing a data science model that uses routinely generated healthcare encounter records to detect possible new outbreaks early in real-time.

The researchers defined an epidemiological indicator that is a proxy for suspected cases of COVID-19 using the health records of emergency care unit (ECU) patients and text mining techniques. The open-field dataset comprised 2,760,862 medical records from nine ECUs, where each record has information about the patient’s age, reported symptoms, and the time and date of admission. They also used a dataset where 1,026,804 cases of COVID-19 were officially confirmed. The records range from January 2020 to May 2022. Sample cross-correlation between two finite stochastic time series was used to evaluate the models.

For patients age 18 years, the researchers found time-lag () = 72 days and cross-correlation () ~ 0.82, = 25 days and ~ 0.93, and = 17 days and ~ 0.88 for the first, second, and third waves, respectively.

The researchers conclude that the developed model can aid in the early detection of signs of possible new COVID-19 outbreaks, weeks before traditional surveillance systems, thereby anticipating in initiating preventive and control actions in public health with a higher likelihood of success.

Reference: Rocha and Solha, et al. COVID-19 outbreaks surveillance through text mining applied to electronic health records. BMC Infectious Diseases. Vol. 24, Article number 359 (2024)