Contextualized Embeddings for Biomedical Data

Apply

Project Description

Contextualized embeddings have revolutionized the field of machine learning. First, as a means to encode text in natural language applications and later on as a representational mechanism for other modalities including image, longitudinal, and high-dimensional structured data. In recent years, embedding approaches have been proposed to address problems in biology and healthcare, however, there are many important questions that require further investigation. For instance, i) how to effectively integrate embeddings for discrete elements with continuous measurements, ii) how to integrate granular temporal or ordering information into embeddings, and iii) how to effectively create embeddings for multimodal data. Successful applicants will work toward developing a model prototype addressing one of the questions above using state-of-the-art representation learning approaches based on deep learning architectures.
Program - BioScience
Division - Biological and Environmental Sciences and Engineering
Faculty Lab Link - https://rhenaog.github.io/
Center Affiliation - Computational Bioscience Research Center
Field of Study - Machine Learning, Representation Learning

About the
Researcher

Ricardo Henao

Associate Professor, Bioscience

Ricardo Henao

​The theme of Professor Henao's research is the development of novel statistical methods and machine learning algorithms primarily based on probabilistic modeling. His expertise covers several fields including applied statistics, signal processing, pattern recognition and machine learning. His methods research focuses on hierarchical or multilayer probabilistic models to describe complex data, such as that characterized by high-dimensions, multiple modalities, more variables than observations, noisy measurements, missing values, time-series, multiple modalities, etc., in terms of low-dimensional representations for the purposes of hypothesis generation and improved predictive modeling. 

Most of his applied work is dedicated to the analysis of biological data such as gene expression, medical imaging, clinical narrative, and electronic health records. His recent work has been focused on the development of sophisticated machine learning models, including deep learning approaches, for the analysis and interpretation of clinical and biological data with applications to predictive modeling for diverse clinical outcomes.

Desired Project Deliverables

Deliverables include a literature review of the state of the art, the implementation of a model prototype and experiments comparing to existing approaches in the literature.

RECOMMENDED STUDENT ACADEMIC & RESEARCH BACKGROUND

Machine Learning
Machine Learning
Representation Learning
Representation Learning
Deep Learning
Deep Learning
Natural Language Processing
Natural Language Processing