Object-centric 3D scene understanding from videos
- Yash Bhalgat, University of Oxford
- Time: 2024-05-10 12:00
- Host: Turing Class Research Committee
- Venue: Room 101, Courtyard No.5, Jingyuan+Online Talk
Abstract
The growing demand for immersive, interactive experiences has underscored the importance of 3D data in understanding our surroundings. Traditional methods for capturing 3D data are often complex and equipment-intensive. In contrast, my research aims to utilize unconstrained videos, such as those from augmented reality glasses, to effortlessly capture scenes and objects in their full 3D complexity. As a first step, I will describe a method to incorporate Epipolar Geometry priors in multi-view Transformer models to enable identifying objects across extreme pose variations. Next, I will discuss my recent work on 3D object segmentation using 2D pre-trained foundation models, following which I will touch upon my ongoing work on Language+3D.
Biography
Yash Bhalgat is a 3rd year PhD student at University of Oxford's Visual Geometry Group (VGG) supervised by Andrew Zisserman, Andrea Vedaldi, Joao Henriques and Iro Laina. His research is broadly in 3D computer vision and machine learning, with specific focus on geometry-aware deep networks (transformers), neural rendering, 3D reconstruction and understanding with language. Previously, he was a Senior Researcher at Qualcomm AI Research in California working on efficient deep learning. He received his Masters in Computer Science from University of Michigan - Ann Arbor, and his Bachelors in Electrical Engineering (with CS minor) from IIT Bombay.