HKUST Red Bird Visiting Scholars Lecture Series

From Descriptive to Predictive: Universality and Language Models in Cancer Genomics

Abstract

Biology has traditionally been a descriptive science. Starting from the notion of universality in physics and mathematics, in this talk the speaker will explore a couple of approaches to transcend the descriptive nature of biology research, with a special focus on cancer genomics. Cancers are seeded by mutations, both inherited and acquired during a lifetime. While the speaker’s research group and many other groups have extensively studied the role of protein mutations, the functional impact of most mutations found in cancer remains unknown. Foundation models have demonstrated significant potential for integrating vast amounts of data and exerting considerable impact across various scientific domains. Initially, the speaker will introduce the Evolutionary and Structure (ES) score, which builds upon protein language models and AlphaFold. The ES score amalgamates evolutionary data with protein structure prediction to prioritize functional regions in proteins. This approach is empirically validated within the context of relapsed pediatric leukemias. Simultaneously, they present GET, a foundation model trained on chromatin accessibility data spanning 235 human cell types. GET excels at predicting gene expression in previously unseen cell types and identifies both universal and cell-specific transcription factor interaction networks. Noteworthy applications encompass the discovery of distant regulatory regions in fetal erythroblasts and the elucidation of the regulatory impact of germline coding mutations in lymphoma-associated transcription factors such as PAX5. Collectively, these computational methodologies exemplify how foundation models can facilitate the study of biological data, yielding functionally relevant insights.

 

About the Speaker

Prof. Raul RABADAN is a Gerald and Janet Carrus Professor in the Departments of Systems Biology, Biomedical Informatics and Surgery at Columbia University. He is currently the Director of the Program for Mathematical Genomics at Columbia University and previously the Director of the NCI Center for Topology of Cancer Evolution and Heterogeneity at Columbia University (2015-2021). From 2001 to 2003, he was a Fellow at the Theoretical Physics Division at CERN, the European Organization for Nuclear Research, in Geneva, Switzerland. In 2003 he joined the Physics Group of the School of Natural Sciences at the Institute for Advanced Study in Princeton. In 2005, he became a Martin A. and Helen Chooljian Member at The Simons Center for Systems Biology at the Institute for Advanced Study in Princeton.

Prof. Rabadan’s current interest focuses on uncovering patterns of evolution in biological systems through the lens of genomics. His recent interests include the development of mathematical approaches to uncover the evolution of cancer and infectious diseases, including topological data analysis and Random Matrix Theory, among others.

Prof. Rabadan has been named one of Popular Science's Brilliant 10 (2010), a Stewart Trust Fellow (2013), and he received the Harold and Golden Lamport Award at Columbia University (2014) and the Diz Pintado award (2018). He received the 2021 Outstanding Investigator Award by the National Cancer Institute. He is also a member of the Cancer Convergence Team by Stand Up to Cancer.

 

For Attendees' Attention

Seating is on a first come, first served basis.

Subscribe to the IAS Newsletter and stay informed.