Events
Events
CS Peer Talks

Language Model for Biological Data

  • Xinyu Yuan, Mila
  • Time: 2023-10-20 16:00
  • Host: Turing Class Research Committee
  • Venue: Room 204, Courtyard No.5, Jingyuan

Abstract

In this presentation, we would focus on language models designed for biological data, such as proteins and cells, and the talk would be split into two parts. First, we would introduce an accepted oral at ICML2023 "ProtST: Multi-Modality Learning of Protein Sequences and Biomedical Texts", which studies how to transit protein language models from primarily capturing co-evolutionary patterns to understanding protein functions, i.e., the role protein plays in managing cell activities and life processes. Following this, we'll present a brief overview of the latest strides made in single-cell language model, especially for single cell RNA sequencing data, a novel and promising avenue in ML domain. Through this talk, we hope to share the idea that the scope of life science relates to many different data modalities that is worth exploring, not only including the known protein sequences, biological texts, but also encompassing a wider array of data such as RNA transcriptomics, cellular structures, and even more advanced aspects.

Biography

 

Xinyu Yuan is a second-year PhD student at Mila, advised by Prof. Jian Tang. Her research passionately delves into the realm of representation learning for biological data, to harness AI's potential to advance understanding of biological systems and processes, with a particular emphasis on scalability and transferability. From the technique perspective, she expertises in large-scale graph representation learning for knowledge graph completion, and pre-training, transfer learning, and multi-modality learning with language models. In particular, knowledge graphs serve as an important bridge to connect all the different data modalities in biological domain. In the same time, language models serve as foundational tools for each data modality,  to extract effective representations and infer latent patterns and deep structure that go well beyond the capacity of humans. She also spent some time in Intel AI Lab. And she obtained a bachelors' degree in computer science from Peking University.