Guiding Aging Research: Development of EHR and Neuroimaging Integrative Analysis methods (GARDENIA)

Overview

Clinical data sources such as Electronic Health Records (EHR) and medical claims data have the potential to serve as an enormous research resource to support goals such as identifying modifiable risk factors for Alzheimer’s Disease (AD), but clinical data have significant limitations including data quality and missing data challenges. To address these limitations, there is increasing interest in linking clinical data with research study data. Connecting these data sources promises to synergize research-quality outcome measures based on neuropathological data with rich information on potentially modifiable AD risk factors present in mid-life such as co-morbid conditions and medication exposures that can be derived from clinical data. However, integrating heterogeneous, inconsistently measured data types from a large clinical database (e.g., diagnosis codes, prescription medications, imaging) with more consistently measured data on a smaller, targeted study population requires development of novel methodologies.

In this research, we propose to address the challenges of integrating clinical and research databases by harnessing deep learning and transfer learning. We will use neuroimaging data from the Alzheimer’s Disease Neuroimaging Initiative and cohort study data from the Adult Changes in Thought study in combination with clinical data from the Kaiser Permanente Washington EHR to develop novel informatics approaches to identification of risk factors for AD.

Aims

  1. To develop a deep learning approach to data integration for heterogeneous clinical data linked to research study data, accounting for complex missing data patterns encountered in clinical data.
  2. To develop a data integration framework to support transfer learning from clinical data to research study data to advance statistical inference about risk factors for AD.

The long- term goal of this research is to accelerate research on AD by facilitating integration of heterogeneously collected clinical data and research study data to capitalize on the unique strengths of each data source.