A multi-step approach to managing missing data in time and patient variant electronic health records.

Citation metadata

Date: Feb. 17, 2022
From: BMC Research Notes(Vol. 15, Issue 1)
Publisher: BioMed Central Ltd.
Document Type: Report
Length: 2,747 words
Lexile Measure: 1410L

Document controls

Main content

Abstract :

Objective Electronic health records (EHR) hold promise for conducting large-scale analyses linking individual characteristics to health outcomes. However, these data often contain a large number of missing values at both the patient and visit level due to variation in data collection across facilities, providers, and clinical need. This study proposes a stepwise framework for imputing missing values within a visit-level EHR dataset that combines informative missingness and conditional imputation in a scalable manner that may be parallelized for efficiency. Results For this study we use a subset of data from AMPATH representing information from 530,812 clinic visits from 16,316 Human Immunodeficiency Virus (HIV) positive women across Western Kenya who have given birth. We apply this process to a set of 84 clinical, social and economic variables and are able to impute values for 84.6% of variables with missing data with an average reduction in missing data of approximately 35.6%. We validate the use of this imputed dataset by predicting National Hospital Insurance Fund (NHIF) enrollment with 94.8% accuracy. Keywords: Electronic medical records, HIV, Imputation, Big data

Source Citation

Source Citation   

Gale Document Number: GALE|A694168220