Tuesday, January 14, 2025 at 11:30
Meyer Building Room 1061 & Zoom
This talk will explore advances in 3D scene reconstruction, focusing on approaches to estimate camera poses and scene structures in challenging multiview and dynamic content scenarios. First, I will outline foundational aspects of my earlier work, where we characterized the algebraic structure of fundamental and essential matrices in multiview settings and developed deep learning methods for joint recovery of camera parameters and sparse 3D scene structures. The main part of the talk introduces TracksTo4D (NeurIPS 2024), a novel, efficient method for reconstructing dynamic 3D structures and camera motion from casual videos. TracksTo4D leverages a dedicated encoder, trained in an unsupervised way on a dataset of casual videos, that uses 2D point tracks as input to infer dynamic 3D structures and camera motion. Our architecture takes into account symmetries in the problem, enforces the reconstruction to be of low rank, and models both static and dynamic scene components. Our model demonstrates strong generalization to unseen videos from new categories, achieving accurate 3D reconstruction and camera localization through a single feed-forward pass while drastically reducing running times.
Short bio: Yoni Kasten is a senior research scientist at NVIDIA Research in Tel Aviv, in Prof. Gal Chechik’s team. His research in 3D computer vision focuses on algebraic characterizations of multi-camera systems and deep neural models for surface reconstruction, dynamic scene modeling, and 4D scene reconstruction. Yoni earned his PhD from the Weizmann Institute, where his work on structure from motion estimation using algebraic characterizations, supervised by Prof. Ronen Basri, received the John F. Kennedy Prize for Outstanding Doctoral Research. He also completed his M.Sc. in Computer Science at the Hebrew University of Jerusalem, under the supervision of Prof. Shmuel Peleg and Prof. Michael Werman.